Rework the state_dict checkpoint format. Old checkpoints should still be loadable for now.
Rename to fast_llm, to stress that it's the standard Fast-LLM checkpointing format.
Replace state_dict.safetensors.index.json (copied from Hugginface format) with a cleaner metadata.yaml that matches the distributed format.
Rename state_dict -> model for safetensors files (was state_dict to make the difference with HF format more obvious, but now the metadata.yaml makes it clear enough)
Change the checkpoint directory structure. Backward compatible in fast-llm, but not for outside uses:
export/ -> export/format/
checkpoints -> checkpoint
This should be it for checkpoints for now. There are more things to do (#26, async checkpoints, polish interfaces, tests, etc.), but I spent enough time on checkpoints.
π Type of change
Select all that apply:
[ ] π Bug fix (non-breaking change that addresses a specific issue)
[x] π New feature (non-breaking change that adds functionality)
[x] β οΈ Breaking change (a change that could affect existing functionality)
β¨ Description
Rework the
state_dict
checkpoint format. Old checkpoints should still be loadable for now.fast_llm
, to stress that it's the standard Fast-LLM checkpointing format.state_dict.safetensors.index.json
(copied from Hugginface format) with a cleanermetadata.yaml
that matches thedistributed
format.state_dict
to make the difference with HF format more obvious, but now themetadata.yaml
makes it clear enough)Change the checkpoint directory structure. Backward compatible in fast-llm, but not for outside uses:
This should be it for checkpoints for now. There are more things to do (#26, async checkpoints, polish interfaces, tests, etc.), but I spent enough time on checkpoints.
π Type of change
Select all that apply: