alibaba / Pai-Megatron-Patch

The official repo of Pai-Megatron-Patch for LLM & VLM large scale training developed by Alibaba Cloud.
Apache License 2.0
674 stars 94 forks source link

pretrain_mcore_llama.py 报错 #161

Closed CaesarWWK closed 5 months ago

CaesarWWK commented 6 months ago

报错信息为 Traceback (most recent call last): File "/mnt/workspace/wangweikuan/Pai-Megatron-Patch/examples/llama2/pretrain_mcore_llama.py", line 271, in <module> pretrain( File "/mnt/cpfs2/wangweikuan/Megatron-LM/megatron/training.py", line 218, in pretrain model, optimizer, opt_param_scheduler = setup_model_and_optimizer( File "/mnt/cpfs2/wangweikuan/Megatron-LM/megatron/training.py", line 484, in setup_model_and_optimizer model = get_model(model_provider_func, model_type) File "/mnt/cpfs2/wangweikuan/Megatron-LM/megatron/training.py", line 368, in get_model model = model_provider_func( File "/mnt/workspace/wangweikuan/Pai-Megatron-Patch/examples/llama2/pretrain_mcore_llama.py", line 67, in model_provider model = GPTModel( File "/mnt/cpfs2/wangweikuan/Megatron-LM/megatron/core/models/gpt/gpt_model.py", line 89, in __init__ rotary_interleaved=self.config.rotary_interleaved, AttributeError: 'TransformerConfig' object has no attribute 'rotary_interleaved'

怀疑因为这个文件history上,从moe改过来,有些地方没改正导致的? from megatron_patch.model.mixtral.transformer_config import TransformerConfig from megatron_patch.model.mixtral.layer_specs import get_gpt_layer_with_transformer_engine_spec mixtral.transformer_config中是没有rotary_interleaved的,但是megatron/core/transformer/transformer_config是有的