Closed robertBrnnn closed 3 weeks ago
Hi @robertBrnnn If I recall well there were two aspects 1) normalization was incorrectly calculated 2) valid was without label smoothing hence not comparable with train (but again this is a long time I may be wrong) also now we are switching to a splin-off of opennmt-py here: https://github.com/eole-nlp/eole
Hi @vince62s , Thanks for your reply.
Just to clarify the above, are you saying normalization was incorrectly calculated and valid is without label smoothing, in v3 or in v2? We've continued with v2 for the time being as we're getting best results with it currently.
Eole looks great, I like the direction you're taking the project, looking forward to trying out the first release!
We've continued with v2 for the time being as we're getting best results with it currently.
You should not get worse results with v3. I have always made sure we get the same results. The only I see above is the bucket_size it is too small for v3, it should be > 200K (I take 262144) to make sure examples are properly shuffled but otherwise you should get similar results.
Hi,
I've been migrating to v3 and have noticed extremely high training and validation perplexity with v3 compared to v2 when training with the same data/vocabs. I initially thought it might be a config difference between v2 and v3 that I missed, but after multiple attempts, no config change I've made has resulted in reduced PPL. Accuracy scores, are similar to those we get with v2.
For instance here are logs for the same data trained against v2 and v3, at the same step: v3.5:
v2.3:
This is the model config:
In the case of v2 config,
self_attn_type
is removed andhidden_size
is changed to the v2 paramrnn_size
. Have I missed some obvious configuration parameter? Or could there be something else that explains the difference between versions?Thanks