Closed saforem2 closed 4 months ago
Changes:
"export LR_WARMUP_FRAC="${LR_WARMUP_FRAC:-0.05}"
LR_DECAY_ITERS
None
megatron/arguments.py
ALCF/aws_ofi_nccl_plugin.sh
Sunspot:
anl_24_q2_release
ALCF/sunspot-env.sh
adds fix for flash-attn discrepancy:
flash-attn
Changes:
"export LR_WARMUP_FRAC="${LR_WARMUP_FRAC:-0.05}"
which will warmup the learning rate over the first 5% of the total training iterationsLR_DECAY_ITERS
during training.None
if not specified, according to the default frommegatron/arguments.py
ALCF/aws_ofi_nccl_plugin.sh
Sunspot:
anl_24_q2_release
ALCF/sunspot-env.sh
to reflect this changeadds fix for
flash-attn
discrepancy:Loss Curves:
![ScreenShot-2024-05-16-115705](https://github.com/argonne-lcf/Megatron-DeepSpeed/assets/5234251/ed1d245b-e1e2-4e63-a18f-e17312f68594)