Open davies-w opened 10 months ago
Hello!
I'm afraid that the computations done by the embedding models that can be loaded using Sentence Transformers are all required for producing the embeddings. If you are struggling with latency, then you can consider:
Thanks Tom! I tried the 40 and 60 mb models mentioned in sbert , and they seemed to produce much poorer similarities for sentences like ("I love my dog", "you like the cat"). I'll take a look at these two though for sure!
BTW, it's not computational latency, its the loading speed of the model, whether from file or from the internet. We're trying to use it in a vercel instance, and they're timing out loading the files, which is why I was hoping it was possible to just load the embedding part of the data. But it sounds like that you need to load everything anyway?
Ah, I see! Interesting, model loading should not be very intensive. It takes about 2 seconds for me normally. Is the instance particularly weak perhaps?
Hi,
I started using SBert for embeddings to use for similarity dot products. However, our system is very resource constrained, and I'm wondering if the embeddings part of the model can be easily extracted and instantiated separately from the DNN layers? With an emphasis on easily, as I'm not a PyTorch expert. I know from past experience that in theory one might be able to do such a thing, but I don't have a lot of time to research this.
Appreciate any help!
W