UKPLab / sentence-transformers

State-of-the-Art Text Embeddings
https://www.sbert.net
Apache License 2.0
15.24k stars 2.47k forks source link

Using JAX models for asymmetric search & Answer Retrieval? #1764

Open regstuff opened 1 year ago

regstuff commented 1 year ago

Hi @nreimers I've been using the msmarco-distilbert-dot-v5 model for asymmetric search and QnA retrieval. I recently came across the models that were created during the Community Jax/Flax week. Was wondering if these Jax models (for eg. the multi_qa_distilbert) give better results in asymmetric and answer retrieval compared to the msmarco-distilbert. Or are the models listed in sbert's MSMarco page the best for this usecase? My dataset is mostly non-technical, oriented more towards philosophy, religion etc.

nreimers commented 1 year ago

Yes, multi-qa models perform better than ms marco models

regstuff commented 1 year ago

Yes, multi-qa models perform better than ms marco models

Thanks for the info. Was wondering if there's any table comparing performance of these models, similar to what exists for the MS Marco models.

There are multiple JAX models, and I'm having a hard time figuring out which one I should use.

A related question: Is it alright to use the dot product models if my distance metric for similarity is euclidean distance? I'm using Scikit's BallTree to find knn, and it does not provide dotproduct as an option. So I am using Euclidean distance.

Thank you