UKPLab / sentence-transformers

State-of-the-Art Text Embeddings
https://www.sbert.net
Apache License 2.0
14.83k stars 2.44k forks source link

albert-xlarge-v2 cosine similary nan #897

Closed tempbrucefu closed 3 years ago

tempbrucefu commented 3 years ago

tried to use albert-xlarge-v2 with train_batch_size 16 on stsbenchmark, training_stsbenchmark.py for the dev evaluation, consine similarity is nan, looks the outputs are closed to constants. please note that it is fine to get results on stsb-distilbert-base but the test results are always a bit lower than the published results.

nreimers commented 3 years ago

This is a common issue with larger models: For some runs, the model will diverge.

Simple solution: Just re-start training until you get a run that works / that converges

More complex solution: Have a look at this paper: https://arxiv.org/abs/2004.08249

They discuss the issue with larger transformer models and propose methods how to reduce the prob. that this happens