sft使用的checkpoint问题

DLLXW / baby-llama2-chinese

用于从头预训练+SFT一个小参数量的中文LLaMa2的仓库；24G单卡即可运行得到一个具备简单中文问答能力的chat-llama2.

MIT License

2.42k stars 296 forks source link

Closed Deep1994 closed 1 year ago

Deep1994 commented 1 year ago

你好，sft阶段我看代码中加载的是“model.load_state_dict(torch.load('./out/baike_pretrain/epoch_0.pth'))”，epoch是0，但是pretrain的时候epoch是2，为什么接着第1个epoch 做sft而不是第2个呢？

DLLXW commented 1 year ago

你好，sft阶段我看代码中加载的是“model.load_state_dict(torch.load('./out/baike_pretrain/epoch_0.pth'))”，epoch是0，但是pretrain的时候epoch是2，为什么接着第1个epoch 做sft而不是第2个呢？

这个无关紧要，只是因为我当时第二个epoch还没训练完。