Open echatzikyriakidis opened 3 years ago
Thanks for the report! I just double checked with the latest code from master
and can confirm that there seems to be a reproducibility issue for the GPU when training the NER model.
We'll look into this!
Thank you @svlandeg !
We can continue our experimentation phase even without determinism since the losses from various runs with different random seeds are more or loss the same. No big flunctuations.
However, if we can have soon a new release with the fix it could be so great.
Please note that the same thing happens when using a pre-trained model, etc, en_core_web_lg.
Hi @svlandeg !
Do we have any update on this?
I managed to track down the source of this problem. In the backprop in HashEmbed
we use cupyx.scatter_add
, which is non-deterministic. So this affects anything that uses a tok2vec layer.
Unfortunately there is not a simple substitution for this without consequences. We could unroll the addition to control the order of operations but it would be too slow. This is also known to be an issue in Pytorch (which doesn't use cupy but a similar implementation) but because the actual change in values is small it's not generally considered an issue (see https://github.com/pytorch/pytorch/issues/50469).
That said we think we can design a deterministic equivalent with a more acceptable speed penalty and will be taking a look at it. In the meantime this is something to be aware of, and this will be the main issue for it, so just subscribe here if you'd like updates.
How to reproduce the behaviour
I cannot reproduce the same results when training a NER model using GPU in Google Colab. When running the same code with CPU it seems to work. However, when enabling GPU with prefer_gpu() the reproduction is not working.
`
Example code
`
Your Environment