How to evaluate the MLM-accuracy of the pre-trained models?

Hello all,

I am curious on what is the MLM-accuracy of my eval-set run on the pre-trained model that google-research provided. Specifically, the bert-large-uncased model. However, when trying to execute the run_pretraining.py script to evaluate the model, I encounter the following error:

tensorflow.python.framework.errors_impl.NotFoundError: Restoring from checkpoint failed. This is most likely due to a Variable name or other graph key that is missing from the checkpoint. Please ensure that you have not altered the graph expected based on the checkpoint. Original error:

Key global_step not found in checkpoint
         [[node save/RestoreV2 (defined at /home/.virtualenvs/ai/lib/python3.6/site-packages/tensorflow_core/python/framework/ops.py:1748) ]]

It seems that the downloaded google-research model does not have a "global_step" Key and so I'm unable to load the model to predict the MLM-accuracy of it. Is there a way I can get the original model with checkpoints weights and all? Is anyone able to evaluate the MLM accuracy of the pre-trained models given by google-research? If so, please let me know how.

google-research / bert

How to evaluate the MLM-accuracy of the pre-trained models? #1276