Embeddings Only? - Githubissues

UKPLab / sentence-transformers

State-of-the-Art Text Embeddings

https://www.sbert.net

Apache License 2.0

15.33k stars 2.48k forks source link

Embeddings Only? #2415

Open davies-w opened 10 months ago

davies-w commented 10 months ago

Hi,

I started using SBert for embeddings to use for similarity dot products. However, our system is very resource constrained, and I'm wondering if the embeddings part of the model can be easily extracted and instantiated separately from the DNN layers? With an emphasis on easily, as I'm not a PyTorch expert. I know from past experience that in theory one might be able to do such a thing, but I don't have a lot of time to research this.

Appreciate any help!

tomaarsen commented 10 months ago

Hello!

I'm afraid that the computations done by the embedding models that can be loaded using Sentence Transformers are all required for producing the embeddings. If you are struggling with latency, then you can consider:

Using a smaller model, e.g. https://huggingface.co/BAAI/bge-small-en-v1.5 is a much respected one.
Look into ONNX. It should allow for some additional speedups, but it can be tricky to get working.

Tom Aarsen

davies-w commented 10 months ago

Thanks Tom! I tried the 40 and 60 mb models mentioned in sbert , and they seemed to produce much poorer similarities for sentences like ("I love my dog", "you like the cat"). I'll take a look at these two though for sure!

davies-w commented 10 months ago

BTW, it's not computational latency, its the loading speed of the model, whether from file or from the internet. We're trying to use it in a vercel instance, and they're timing out loading the files, which is why I was hoping it was possible to just load the embedding part of the data. But it sounds like that you need to load everything anyway?

tomaarsen commented 10 months ago

Ah, I see! Interesting, model loading should not be very intensive. It takes about 2 seconds for me normally. Is the instance particularly weak perhaps?