fp16 training errors for mt5

UKPLab / sentence-transformers

State-of-the-Art Text Embeddings

Apache License 2.0

14.78k stars 2.43k forks source link

Hello!

I'm glad to hear that it does work for google-t5/t5-base, so I agree with you that we're probably dealing with the issue from https://discuss.huggingface.co/t/t5-fp16-issue-is-fixed/3139.

I found out that the original fix was this: https://github.com/huggingface/transformers/pull/9487 These changes have also been propagated to mt5: https://github.com/huggingface/transformers/blob/main/src/transformers/models/mt5/modeling_mt5.py#L576

But there's indeed some folks that still report issues. It's an issue that would need to be fixed in transformers I think. Alternatively, you can use bf16=True if your GPU supports that, or train with full precision instead.

Tom Aarsen

UKPLab / sentence-transformers

fp16 training errors for mt5 #2703