alibaba / Pai-Megatron-Patch

The official repo of Pai-Megatron-Patch for LLM & VLM large scale training developed by Alibaba Cloud.
Apache License 2.0
674 stars 94 forks source link

cannot import name 'TEDotProductAttentionMLA' when running `examples/deepseek_v2/run_mcore_deepseek.sh` #359

Open dreasysnail opened 19 hours ago

dreasysnail commented 19 hours ago

Thank you for the great project! When I run examples/deepseek_v2/run_mcore_deepseek.sh I got an error as below:

Traceback (most recent call last):
  File "/mnt/task_runtime/examples/deepseek_v2/pretrain_deepseek.py", line 37, in <module>
    from megatron_patch.model.deepseek_v2.layer_specs import (
  File "/mnt/task_runtime/megatron_patch/model/deepseek_v2/layer_specs.py", line 19, in <module>
    from megatron.core.transformer.custom_layers.transformer_engine import (
ImportError: cannot import name 'TEDotProductAttentionMLA' from 'megatron.core.transformer.custom_layers.transformer_engine' (/mnt/task_runtime/PAI-Megatron-LM-240718/megatron/core/transformer/custom_layers/transformer_engine.py)

It appears that in this link the code is attempting to import 'TEDotProductAttentionMLA', but when I checked the megatron.core.transformer.custom_layers.transformer_engine file, I did not find 'TEDotProductAttentionMLA'.

Any help appreciated!

dreasysnail commented 19 hours ago

@Jiayi-Pan