alibaba / Pai-Megatron-Patch

The official repo of Pai-Megatron-Patch for LLM & VLM large scale training developed by Alibaba Cloud.
Apache License 2.0
674 stars 94 forks source link

Fix issues that example mcore models dont scale query value. #244

Closed billishyahao closed 4 months ago

billishyahao commented 4 months ago

There is argument behaviour change in latest megatron-lm repo as below:

In Megatron-LM-23*, qk factor is by default enabled which helps make training stable especially in fp16 case.

    group.add_argument('--no-query-key-layer-scaling', action='store_false',
                       help='Do not scale Q * K^T by 1 / layer-number.',
                       dest='apply_query_key_layer_scaling')
    group.add_argument('--attention-softmax-in-fp32', action='store_true',
                       help='Run attention masking and softmax in fp32. '
                       'This flag is ignored unless '
                       '--no-query-key-layer-scaling is specified.')

However, in Megatron-LM-24*, the argument is by default disabled and recommended to be enabled for fp16 training.

    group.add_argument('--apply-query-key-layer-scaling', action='store_true',
                       help='Scale Q * K^T by 1 / layer-number. '
                       'Useful for fp16 training.')
    group.add_argument('--attention-softmax-in-fp32', action='store_true',
                       help='Run attention masking and softmax in fp32. '
                       'This flag is ignored unless '
                       '--no-query-key-layer-scaling is specified.')

More evidence shown on Megatron-LM upstream testcase:

Megatron-LM/tests/functional_tests/test_scripts/gpt3/pretrain_gpt3_distributed_test.sh

       --${TRAINING_DTYPE}"

if [[ "${TRAINING_DTYPE}" == "fp16" ]]; then
    torch_run_cmd+=" --apply-query-key-layer-scaling"
fi

Above arguments hence lead to different training result. This patch is used to fix this behaviours. So if fp16 training is being launched, scale factor would take effect to avoid training instable.

CLAassistant commented 4 months ago

CLA assistant check
All committers have signed the CLA.