support cached losses in combination with matryoshka loss

UKPLab / sentence-transformers

State-of-the-Art Text Embeddings

https://www.sbert.net

Apache License 2.0

15.44k stars 2.5k forks source link

support cached losses in combination with matryoshka loss #3068

Closed Marcel256 closed 1 week ago

Marcel256 commented 1 week ago

Proof of concept for supporting cached losses in combination with the matryoshka loss

tomaarsen commented 1 week ago

I trained a model with this loss here:

https://huggingface.co/tomaarsen/mpnet-base-gooaq-cmnrl-mrl

To which a solid baseline is:

https://huggingface.co/tomaarsen/mpnet-base-gooaq

The differences is that the former is trained with Matryoshka (aka MRL) and CachedMultipleNegativesRankingLoss (CMNRL) with a batch size of 2048 and a learning rate of 8e-5, whereas the latter used MultipleNegativesRankingLoss with a batch size of 64 and a learning rate of 2e-5.

The overall general-purpose performance of the new model is slightly better, i.e. 0.5163 NDCG@10 compared to 0.5043 NDCG@10 for the baseline. Good news!

I like that this doesn't require notable changes in the Cached... losses, and well done on finding those duplication fixes.

Tom Aarsen

Marcel256 commented 1 week ago

I cleaned up the Matryoshka loss a bit more. Moreover, I also see potential to further reduce code duplication in the cached loss classes, which could be done in a separate PR. Are there any other points which I should adress in this PR?

tomaarsen commented 1 week ago

I think this PR is looking quite solid. All I think that remains is to run

pip install pre-commit
pre-commit install # Set a pre-commit hook that runs formatting/linting code before every commit (in this repository only)
pre-commit run --all # Run the aforementioned hook separate from doing a commit

It should satisfy the quality code check.

Tom Aarsen

Marcel256 commented 1 week ago

Thank you! The formatting should be fine now

tomaarsen commented 1 week ago

I added some docstrings/comments as the various decorators start to get a bit confusing otherwise. I'm a fan of this approach, well done. As a result, I will close #3065, although I do appreciate your invested time to experiment, this was not a straightforward problem to tackle.

I'll merge this once the tests go green.

Tom Aarsen