Closed jbrry closed 3 years ago
Hi @jbrry ,
I think we manually exteded the BERT pretraining code, good reference should be the ALBERT code:
Then you can introduce a new keep_checkpoint_max
cli argument, e.g. like:
flags.DEFINE_integer("keep_checkpoint_max", 5,
"How many checkpoints to keep.")
I added that parameter to ELECTRA (see https://github.com/google-research/electra/pull/47), but for BERT you unfortuntately need to implement it manually.
Oh, and if you want to keep all checkpoints, just use 0
for keep_checkpoint_max
:
https://www.tensorflow.org/versions/r1.15/api_docs/python/tf/estimator/RunConfig
Brilliant, thanks a million for your help and for adding that to the ELECTRA model!
Hi there, thank you for all of the helpful advice on training Transformer models!
In your recent paper German’s Next Language Model, you compare ELECTRA and BERT at various checkpoints. I have tried to do the same thing on my own data. For ELECTRA it is working (saving every 50k steps to GCS bucket) but for BERT the checkpoints are being overwritten. I have tried setting a value for
save_checkpoint_steps
but this still seems to just keep the 5 most-recent checkpoints. May I ask how you were able to keep the older checkpoints from being overwritten?I am using the official BERT repository from Google: https://github.com/google-research/bert
Thanks!