'adam_m not found in checkpoint ' when further pretraining

DayuanJiang commented 4 years ago

When I was trying further pretraining on the models with domain-specific data in Colab, I encountered a problem that the official pretrained model could not be loaded.

Here is the commend for further pretraining.

hparam =    '{"model_size": "small", \
             "use_tpu":true, \
             "num_tpu_cores":8, \
             "tpu_name":"grpc://10.53.161.26:8470", \
             "num_train_steps":4000100,\
             "pretrain_tfrecords":"gs://tweet_torch/electra/electra/data/pretrain_tf_records/pretrain_data.tfrecord*", \
             "model_dir":"gs://tweet_torch/electra/electra/data/electra_small/", \
             "generator_hidden_size":1.0\
            }'
!python electra/run_pretraining.py  \
                    --data-dir "gs://tweet_torch/electra/electra/data/" \
                    --model-name "electra_small" \
                    --hparams '{hparam}'

And the error message is pretty long so I just paste some of it that seems useful.

ERROR:tensorflow:Error recorded from training_loop: Restoring from checkpoint failed. This is most likely due to a Variable name or other graph key that is missing from the checkpoint. Please ensure that you have not altered the graph expected based on the checkpoint. Original error:
From /job:worker/replica:0/task:0:
Key discriminator_predictions/dense/bias/adam_m not found in checkpoint
     [[node save/RestoreV2 (defined at /tensorflow-1.15.2/python3.6/tensorflow_core/python/framework/ops.py:1748) ]]

pnhuy commented 4 years ago

I also had the same problem.

It seems that the adam_m parameter was removed from the checkpoint before saving. (https://github.com/google-research/bert/issues/99#issuecomment-442069063)

So if we don't have the full checkpoint, we can't do further training.

Just waiting for the full checkpoint.

clarkkev commented 4 years ago

You should be able to do further training, just don't initialize the Adam parameters from the checkpoint by doing something like this. I don't think refreshing the Adam parameters will cause any real problem with the model.

w5688414 commented 4 years ago

I meet the same problem

ghost commented 4 years ago

Hello, I checked the solution @clarkkev mentioned above but still don't know the exact solution. Can anyone provide any further help? I am new to tensorflow and did not find anywhere skip the adam_m parameters from the link above. Thank you in advance.

Veyronl commented 3 years ago

I have the same trouble.Could you tell me how to fix it? @clarkkev @Lincoln-Jiang @w5688414 Thank you in advance.

DayuanJiang commented 3 years ago

@Veyronl I just gave up using Electra.

google-research / electra

'adam_m not found in checkpoint ' when further pretraining #45