dbmdz / berts

DBMDZ BERT, DistilBERT, ELECTRA, GPT-2 and ConvBERT models
MIT License
153 stars 12 forks source link

Advice on evaluating BERT models every n steps #32

Closed jbrry closed 3 years ago

jbrry commented 3 years ago

Hi there, thank you for all of the helpful advice on training Transformer models!

In your recent paper German’s Next Language Model, you compare ELECTRA and BERT at various checkpoints. I have tried to do the same thing on my own data. For ELECTRA it is working (saving every 50k steps to GCS bucket) but for BERT the checkpoints are being overwritten. I have tried setting a value for save_checkpoint_steps but this still seems to just keep the 5 most-recent checkpoints. May I ask how you were able to keep the older checkpoints from being overwritten?

I am using the official BERT repository from Google: https://github.com/google-research/bert

Thanks!

stefan-it commented 3 years ago

Hi @jbrry ,

I think we manually exteded the BERT pretraining code, good reference should be the ALBERT code:

https://github.com/google-research/albert/blob/cbce824721148a8eb2621f1b1c1af252ac5121cc/run_pretraining.py#L499

Then you can introduce a new keep_checkpoint_max cli argument, e.g. like:

flags.DEFINE_integer("keep_checkpoint_max", 5,
                     "How many checkpoints to keep.")

I added that parameter to ELECTRA (see https://github.com/google-research/electra/pull/47), but for BERT you unfortuntately need to implement it manually.

stefan-it commented 3 years ago

Oh, and if you want to keep all checkpoints, just use 0 for keep_checkpoint_max:

https://www.tensorflow.org/versions/r1.15/api_docs/python/tf/estimator/RunConfig

jbrry commented 3 years ago

Brilliant, thanks a million for your help and for adding that to the ELECTRA model!