allenai / allennlp

An open-source NLP research library, built on PyTorch.
http://www.allennlp.org
Apache License 2.0
11.71k stars 2.24k forks source link

Questions about start training from checkpoint using --recover #5722

Closed HamLaertes closed 1 year ago

HamLaertes commented 1 year ago

For some reason, during training:

My implementation is using the argument --recover. Allennlp will store the checkpoint after every epoch. So, for epochs after the first, I add --recover to the training commands, wishing the model's parameters and training states will be restored. However, the above implementation seems wrong because, in my testing, training epoch 2 from the checkpoint of epoch 1 gives different results from training epoch 2 and 1 together. I tried hard to read the allennlp document but find difficult to figure the problem out. Any guys have comments on my implementation, or other ways to fulfill my requirements? Thanks a lot!!!

github-actions[bot] commented 1 year ago

This issue is being closed due to lack of activity. If you think it still needs to be addressed, please comment on this thread 👇