ServiceNow / Fast-LLM

Accelerating your LLM training to full speed
https://servicenow.github.io/Fast-LLM/
Other
37 stars 5 forks source link

New long-term checkpoint format #33

Closed jlamypoirier closed 2 weeks ago

jlamypoirier commented 3 weeks ago

✨ Description

Rework the state_dict checkpoint format. Old checkpoints should still be loadable for now.

Change the checkpoint directory structure. Backward compatible in fast-llm, but not for outside uses:

This should be it for checkpoints for now. There are more things to do (#26, async checkpoints, polish interfaces, tests, etc.), but I spent enough time on checkpoints.

πŸ” Type of change

Select all that apply: