alibaba / Pai-Megatron-Patch

The official repo of Pai-Megatron-Patch for LLM & VLM large scale training developed by Alibaba Cloud.
Apache License 2.0
674 stars 94 forks source link

支持Qwen2-72B? #250

Closed Crystalxd closed 3 months ago

Crystalxd commented 4 months ago

如题。转换时报错:

Zarr-based strategies will not be registered because of missing packages
Converting
converting embedding layer
converting transformer layers
Traceback (most recent call last):
  File "/code/Pai-Megatron-Patch/toolkits/model_checkpoints_convertor/qwen/hf2megatron_qwen1.5.py", line 1124, in <module>
    main()
  File "/code/Pai-Megatron-Patch/toolkits/model_checkpoints_convertor/qwen/hf2megatron_qwen1.5.py", line 1120, in main
    convert_checkpoint_from_transformers_to_megatron(args)
  File "/code/Pai-Megatron-Patch/toolkits/model_checkpoints_convertor/qwen/hf2megatron_qwen1.5.py", line 599, in convert_checkpoint_from_transformers_to_megatron
    params = transformers_to_megatron_fix_query_key_value_ordering(
  File "/code/Pai-Megatron-Patch/toolkits/model_checkpoints_convertor/qwen/hf2megatron_qwen1.5.py", line 306, in transformers_to_megatron_fix_query_key_value_ordering
    param = param.view(*current_shape)
RuntimeError: shape '[3, 64, 128, 8192]' is invalid for input of size 83886080
Wang895 commented 3 months ago

qwen2 72b uses GQA, you can refer to qwen1.5 32b modify conversion script.

jerryli1981 commented 3 months ago

qwen2的dense模型接好了,烦请评测:https://github.com/alibaba/Pai-Megatron-Patch/pull/258