HuangLK / transpeeder

train llama on a single A100 80G node using 🤗 transformers and 🚀 Deepspeed Pipeline Parallelism
Apache License 2.0
208 stars 18 forks source link

train.py中加载checkpoint似乎没效 #21

Closed GongCQ closed 1 year ago

GongCQ commented 1 year ago

train.py中的第108行

engine.load_checkpoint(model_args.init_ckpt, load_module_only=True)

有没有这一行,训练初始的loss都一样。好像并没有成功加载到模型参数