google-research / electra

ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators
Apache License 2.0
2.33k stars 352 forks source link

Issue with loading weights for eval #54

Closed asharma20 closed 4 years ago

asharma20 commented 4 years ago

I'm trying to validate that I am running experiments on finetuned ELECTRA model correctly by using only tf.train.init_from_checkpoint() to load the weights. When evaluating my finetuned ELECTRA model using run_finetuning.py, I've found I get different accuracy results when using a different model_dir than the directory containing the finetuned model for run_config = tf.estimator.tpu.RunConfig(...).

With the original code, I get the expected accuracy (~81) but when I modify the model_dir argument in tf.estimator.tpu.RunConfig() to an empty directory, I get much lower and non-deterministic accuracy (~32). I was wondering why that is since the weights are still being loaded using tf.train.init_from_checkpoint(). Are there variables being loaded using tf.estimator.tpu.RunConfig()?

clarkkev commented 4 years ago

The weights are first loaded from tf.train.init_from_checkpoint(). However, if the model dir in tf.estimator.tpu.RunConfig() contains a checkpoint, that one is also loaded, overriding the first load. By default the first load is for pre-trained weights and the second load is for continuing fine-tuning if it was interrupted or for running eval if a fine-tuned model already exists. So changing the model_dir argument in RunConfig will cause bad eval results if you are doing evaluation with no training because only the pre-trained weights will be loaded, not the fine-tuned ones. Is that your situation?

asharma20 commented 4 years ago

Yes I believe so. Thank you for clarifying.