Closed xingyaoww closed 8 months ago
Hi @AleHD , Thanks for your feedback! I checked the two additional places:
Both _get_models
AND _setup_model_and_optimizer
calls load_checkpoint
in megatron/checkpointing.py
to load models, which in turns calls _load_base_checkpoint
in the same file (_get_models
, _setup_model_and_optimizer
, load_checkpoint
function).
Since my modification directly modifies _load_base_checkpoint
(code here) and will override the metadata iteration when load_iters
is specified, do we still need to explicitly modify these two functions?
@AleHD Thanks a lot! I have accept the two suggestions!
Support converting a sharded checkpoint with a specified iteration back to unshared version.
For example, you can set $LOAD_ITER to 52 to load the checkpoint of 52nd iteration
$LOAD_DIR/iter_0000052
. This will override the read iteration number from the tracker file.