Open munael opened 4 years ago
Can you try using fairseq master instead of 0.9.0?
Anyway, this is most likely model/dataset dependent, but I will try this a bit later and see what’s going on.
To confirm, what kind of task/model is this?
I'll check.
It does seem to depend on model/dataset. But is the batch-skipping behavior actually intentional? Why? Why not (for example) repeat with updated loss scale until it's correct?
Translation with pre-trained transformer_vaswani_wmt_en_de_big
(transformer.wmt14.en-fr
) from here (https://github.com/pytorch/fairseq/tree/master/examples/translation#pre-trained-models).
@myleott This still occurs with the latest master :/
🐛 Bug
To Reproduce
Steps to reproduce the behavior (always include the command you ran):
I don't have a small enough MWE...
--fp16
--fp16-scale-tolerance 0
Code sample
:(
Expected behavior
Environment
pip
, source): git tag0.9.0
with debugging changes addedAdditional context