This PR integrated flash checkpoint, referring to dlrover.
Dlrover's core capabilities are invoked, but FlagScale customizes file paths, data organization, and so on.
To use the save and load functions of flash checkpoint, see the following example:
from flagscale.train.checkpointing import save_checkpointfrom flagscale.train.checkpointing import load_checkpoint
Interfaces are compatible with megatron and dlrover, the results are shown below:
save
load
If you don't want to use flash checkpoint, just set flash=False, the save and load functions provided by megatron are used.
This PR integrated flash checkpoint, referring to dlrover. Dlrover's core capabilities are invoked, but FlagScale customizes file paths, data organization, and so on. To use the save and load functions of flash checkpoint, see the following example:
from flagscale.train.checkpointing import save_checkpoint
from flagscale.train.checkpointing import load_checkpoint
Interfaces are compatible with megatron and dlrover, the results are shown below:save
load
If you don't want to use flash checkpoint, just set
flash=False
, the save and load functions provided by megatron are used.