Open piyushghai opened 3 years ago
I'm not able to replicate these results - I saw the loss drop to 7.5 within 2k steps. Can you try again with the latest version of the codebase? I've updated the Dockerfile to be TF 2.4 with the latest version of transformers
.
@rondogency Can you share the env file on which we saw this issue ?
@jarednielsen Did you try to reproduce before the transformers
library upgrade or after it ?
After the transformers
upgrade to v4.2.0.
When training BERT with TF 2.3, the loss would decrease and
MLM_Acc
would be non-zero. After upgrading to TF 2.4 and using the same script, the loss does not decrease andMLM_Acc
remains 0.0Note : The hyperparameters were unchanged between the runs with TF2.3 and TF 2.4.
Here are the logs of a 2 node run :