Closed bdhingra closed 6 years ago
I think that is because we are using exponential moving average (EMA) weights. At test time we use the EMA of the weights, but the dev score computed during the training cycle just use the current weights, hence the difference. You can run the eval script with --no_ema
and you should see the same scores.
Indeed that is the case. There is still a small difference between the two (for example, 0.680580080475 v 0.68047) but it is close enough. Thanks for your help!
Hi,
Thanks for open-sourcing your code!
I noticed a discrepancy between the EM and F1 scores logged during training and those computed when evaluating the model separately using
docqa/eval/squad_eval.py
. The difference is significant at the beginning of training, but becomes small by the end of training. It'll be super helpful if you could explain where the difference comes from, and more importantly, which are the "correct" scores.A couple of disclaimers before I describe in more detail:
Unfortunately, my compute environment does not allow me to change the above to see if the problem persists with python3.5 and ujson. However, the code runs fine, and I believe the problem to be somewhere else. Please correct me if I am wrong though.
So I am running the
paragraph
setting on Squad as follows:And I am evaluating the output checkpoints as follows:
The output scores I see on Tensorboard and from the evaluation script are as follows:
squad_eval.py
Accsquad_eval.py
F1squad_eval.py
text-EMsquad_eval.py
text-F1As you can see the
squad_eval.py
is much lower than the Tensorboard performance initially, but catches up with it around update 5000. Later it even becomes slightly better.I guess my main questions are --
The reason why I am interested in the initial performance is because I am running some experiments with only 10% of squad training set. In this case there is a big difference in the performances logged during training and from the evaluation script, similar to the top rows of the table above.
Thanks a lot for your time! Bhuwan