[Question]: 训练中途报错中断后，重新训练如何在已训练的checkpoint上继续训练而不是重新开始训练

PaddlePaddle / PaddleNLP

👑 Easy-to-use and powerful NLP and LLM library with 🤗 Awesome model zoo, supporting wide-range of NLP tasks from research to industrial applications, including 🗂Text Classification, 🔍 Neural Search, ❓ Question Answering, ℹ️ Information Extraction, 📄 Document Intelligence, 💌 Sentiment Analysis etc.

https://paddlenlp.readthedocs.io

Apache License 2.0

12.17k stars 2.94k forks source link

[Question]: 训练中途报错中断后，重新训练如何在已训练的checkpoint上继续训练而不是重新开始训练 #7952

Closed Matter-Charles closed 4 months ago

Matter-Charles commented 9 months ago

请提出你的问题

训练中途报错中断后，重新跑finetune.py代码发现模型重新训练，从checkpiont-100开始，是否有参数可以选择从之前训练过的某个checkpoint开始继续训练？

gongel commented 9 months ago

重跑就可以了哈，自动继续训练。

github-actions[bot] commented 7 months ago

This issue is stale because it has been open for 60 days with no activity. 当前issue 60天内无活动，被标记为stale。

w5688414 commented 6 months ago

可以类似这样，指定自己的checkpoint。

https://github.com/PaddlePaddle/PaddleNLP/blob/ac117a108de2d777fe77d542c732dc5a83889b5d/applications/information_extraction/document/finetune.py#L141

github-actions[bot] commented 4 months ago

This issue is stale because it has been open for 60 days with no activity. 当前issue 60天内无活动，被标记为stale。

github-actions[bot] commented 4 months ago

This issue was closed because it has been inactive for 14 days since being marked as stale. 当前issue 被标记为stale已有14天，即将关闭。