Closed Marcel256 closed 1 week ago
I trained a model with this loss here:
To which a solid baseline is:
The differences is that the former is trained with Matryoshka (aka MRL) and CachedMultipleNegativesRankingLoss (CMNRL) with a batch size of 2048 and a learning rate of 8e-5, whereas the latter used MultipleNegativesRankingLoss with a batch size of 64 and a learning rate of 2e-5.
The overall general-purpose performance of the new model is slightly better, i.e. 0.5163 NDCG@10 compared to 0.5043 NDCG@10 for the baseline. Good news!
I like that this doesn't require notable changes in the Cached... losses, and well done on finding those duplication fixes.
I cleaned up the Matryoshka loss a bit more. Moreover, I also see potential to further reduce code duplication in the cached loss classes, which could be done in a separate PR. Are there any other points which I should adress in this PR?
I think this PR is looking quite solid. All I think that remains is to run
pip install pre-commit
pre-commit install # Set a pre-commit hook that runs formatting/linting code before every commit (in this repository only)
pre-commit run --all # Run the aforementioned hook separate from doing a commit
It should satisfy the quality code check.
Thank you! The formatting should be fine now
I added some docstrings/comments as the various decorators start to get a bit confusing otherwise. I'm a fan of this approach, well done. As a result, I will close #3065, although I do appreciate your invested time to experiment, this was not a straightforward problem to tackle.
I'll merge this once the tests go green.
Proof of concept for supporting cached losses in combination with the matryoshka loss