Learning implications for loss_scale

McGill-NLP / llm2vec

Code for 'LLM2Vec: Large Language Models Are Secretly Powerful Text Encoders'

https://mcgill-nlp.github.io/llm2vec/

MIT License

1.17k stars 88 forks source link

Learning implications for loss_scale #110

Closed daegonYu closed 2 months ago

daegonYu commented 3 months ago

hello!

The loss scale is set to 20 for sentence similarity learning in SimCSE and 50 (default) for supervised contrastive training. Are there any benefits resulting from this loss scale change?

vaibhavad commented 2 months ago

Hi @daegonYu,

We did not run ablations on the choice of loss scale. We followed existing literature on this - Echo embeddings hyperparameters for sentence similarity and SimCSE for unsupervised contrastive learning.

daegonYu commented 2 months ago

oh! I misunderstood. thank you for telling me!