pth convert to hf model 出现问题

InternLM / xtuner

An efficient, flexible and full-featured toolkit for fine-tuning LLM (InternLM2, Llama3, Phi3, Qwen, Mistral, ...)

https://xtuner.readthedocs.io/zh-cn/latest/

Apache License 2.0

4.01k stars 315 forks source link

Open no-execution opened 1 month ago

no-execution commented 1 month ago

按照readme中流程完成训练 64 卡训练qwen2.5 72B模型生成了.pth文件夹，一共64个.pt文件在转hf模型过程中，突然中断，没有任何报错

check了内存、显存、cpu占用，均无异常

7B模型就可以转换成功

看了一下，是读deepspeed 的 .pt文件时中断的

有什么解决办法吗？