dmlc / gluon-nlp

NLP made easy
https://nlp.gluon.ai/
Apache License 2.0
2.56k stars 538 forks source link

Training continuation feature #1408

Open leezu opened 3 years ago

leezu commented 3 years ago

Description

We may save the training state in the output directory. If a training script is executed, it may check the output directory specified for any prior training runs and resume those automatically if hyperparameters are compatible. If incompatible, a error should be raised.

References

Sockeye implementation:

https://github.com/awslabs/sockeye/blob/29795b828593ca68cfe923d611b67e079bc0dca9/sockeye/train.py#L138-L152

sxjscience commented 3 years ago

We may add the continuation training feature to our existing Translation example, ELECTRA example and BERT example.