FlagOpen / FlagScale

FlagScale is a large model toolkit based on open-sourced projects.
Other
167 stars 42 forks source link

[Flash Checkpoint] Integrated flash checkpoint #155

Closed Caozhou1995 closed 1 week ago

Caozhou1995 commented 4 months ago

This PR integrated flash checkpoint, referring to dlrover. Dlrover's core capabilities are invoked, but FlagScale customizes file paths, data organization, and so on. To use the save and load functions of flash checkpoint, see the following example: from flagscale.train.checkpointing import save_checkpoint from flagscale.train.checkpointing import load_checkpoint Interfaces are compatible with megatron and dlrover, the results are shown below:

If you don't want to use flash checkpoint, just set flash=False, the save and load functions provided by megatron are used.