Closed hcdeng6 closed 4 years ago
Hi @hcdeng6,
You should use the --resume
flag and specify either --output-dir
or --run-uuid
to point to your partially trained model (https://unbabel.github.io/OpenKiwi/cli/train.html#training-save-load).
Hey @hcdeng6 I'm going to assume this issue has been solved.
Feel free to re-open if you still have problems
Hi, I am using a very large corpus to train a predictor, and I set 6 epochs totally. Each epoch costs me more than 24 hours because of the large-scale corpus. However, it seems that my machine could not stand such a heavy work and the program got interrupted two times when it was on the 4th epoch. However, restarting the kiwi program will waste the former epoch, so I wonder how I can get the checkpoint or continue predictor training from where the program interrupted. Could you tell me what I should do? Thank you.