argonne-lcf / Megatron-DeepSpeed

Ongoing research training transformer language models at scale, including: BERT & GPT-2
Other
9 stars 12 forks source link

Train skip range #56

Closed saforem2 closed 1 month ago

saforem2 commented 1 month ago

Add logic for manually skipping a predefined range(s) of training iterations.

e.g.

PBS_O_WORKDIR=$(pwd) bash train_aGPT_7B.sh --train-range-to-skip 25 100 105 250

will skip all training iterations between [25, 100] and [105, 250][^pairs].

[^pairs]: Note that these ranges must be specified in pairs.