Open mrpeerat opened 3 months ago
Hello!
This took quite a while to debug, but it seems that the combination of fp16
and softmax
is a bit unstable, resulting in nan
after a Dropout
, which then resulted in nan
as the loss, preventing further training. See also https://discuss.pytorch.org/t/getting-nans-from-dropout-layer/70693/5
I was able to resolve this issue by training without FP16 by setting use_amp=False
. Hope this helps!
Hi! I'm using sentence-transformers/gtr-t5-base as the base encoder with SimCSE on sentence transformers (this example) However, I looked at the dev score on the STS-B dev set, and there were no changes to the dev score. Here is an example:![Screenshot 2024-03-23 at 1 25 39 PM](https://github.com/UKPLab/sentence-transformers/assets/21156980/3988b562-61b1-4937-ad0b-bcde1756e3f0)
Here is the code that I use:
Thank you in advance.