UKPLab / sentence-transformers

Multilingual Sentence & Image Embeddings with BERT
https://www.SBERT.net
Apache License 2.0
14.38k stars 2.4k forks source link

How to Deploy an SBERT model? #245

Open ar3717 opened 4 years ago

ar3717 commented 4 years ago

Hi, I am building a semantic search application and I want to deploy (put into production) my fine-tuned domain-adapted SBERT model. Any idea/recommendations for doing that?

nreimers commented 4 years ago

Do you need any specific help?

ar3717 commented 4 years ago

Yes, so let's say I want to have a server where I can send an http request to it along with a sentence (query) and I want to get back the embedding based on your pre-trained SBERT-NLI model. Do you have any suggestions on how I can do that?

nreimers commented 4 years ago

You would need two components: The sentence embeddings service and the index server.

For the sentence embeddings service, I would use fastAPI. It should only be few lines of python code to run sentence transformers there and to return the embedding for a sent sentence.

Then for the corpus you would need to index your sentence embeddings so that you can search them. If you have a small corpus, you could use ElasticSearch.

For larger corpora, I can recommend faiss.

ar3717 commented 4 years ago

Thanks a lot. Would you recommend Flask as another alternative for the sentence embedding service?

nreimers commented 4 years ago

Flask also works

FantasyCheese commented 4 years ago

Hi @nreimers could you kindly elaborate Faiss vs ElasticSearch thing? Like how large are we talking about when Faiss is recommended and what's the problem that ElasticSearch might have? Our backend team have strong AWS background and it would be hard to convince them not to use AWS ElasticSearch Serivce.

nreimers commented 4 years ago

Hi @Fantasycheese The issue is that ES is performing full nearest neighbor search. If you have 1 Million documents (vectors) in your index, the vector of the index is compared against all 1 Million docs. Hence, the runtime is linear with the number of docs in your index.

In your experiments, the latency up to around 100k was OK. But this of course depends on your setup and how time critical your task is.

Faiss on the other hand uses approximate nearest neighbor (ANN) and is able to index the embeddings. There, you can retrieve the results within milli seconds, independent how many vectors you have indexed. Even when you have Million or Billion docs (vectors), you can find the nearest neighbors efficiently.

ANN is something ES is working on since a year, but I think so far it was yet done: https://github.com/elastic/elasticsearch/issues/42326

FantasyCheese commented 4 years ago

@nreimers Wow that was fast and clear! Thanks a lot for your clarification!

pistocop commented 4 years ago

Hi @ar3717 , I'm actually working on a very similar task and I have a question for you:

When you say

my fine-tuned domain-adapted SBERT model

what do you mean?

You have fine-tuned some BERT-kind model on a domain-specific dataset in an unsupervised manner, and then used this domain-specific-BERT model to train SBERT version using the same datasets used by @nreimers ?

Or you have taken another approach?

Thanks in advance if you will share some information.

ar3717 commented 4 years ago

@GuardatiSimone Hi, Yeah, that is what I have done so far basically. Fine tune BERT on my domain and then feed that fine-tuned BERT into SBERT and use one of the training data sets (e.g. NLI or STS) to retrain the SBERT. But I am planing to use my own corpus to retrain my SBERT based on the wikipedia task in @nreimers paper. What is your approach?

pistocop commented 4 years ago

Hi @ar3717 and thanks for the reply.

At the moment, I'm using SBERT pretrained to calculate the sentences embeddings, then feed them into a Faiss system and build a BE (FastAPI) to get the top-K neighbors from it.

Although pretrained SBERT is working well, I need to build an embedding system able to adapt to a domain-specific context - in an unsupervised manner.

The approach are you using [1] produce better embeddings? And the global effort [1] is high in computational terms?

[1] BERT finetune on specific corpora + SBERT training on NLI/STS

ar3717 commented 4 years ago

it really depends on your data size. So far, I have tried fine tuning BERT on a small domain specific corpus and I have seen some improvements but I think if I increase my corpus size (that is specific to my domain), I will get much better results. The corpus that I used for fine-tuning is around 6MB and it took 15 min on an AWS GPU ml.p3.2xlarge instance for fine-tuning. SBERT NLI training on the finetuned model took like 1.5 hours on NLI data on the same AWS GPU instance. Let me know if that helps.

pistocop commented 4 years ago

Hi, many thanks for your sharing, it is very useful for me.

I'm was trying to gather more informations as possible before starting my work, mainly to know more or less if the goal (better embeddings) could be reached, and the possible price range of the machines for the training.

So many thanks @ar3717 for the info and nreimers for the amazing repository.

cabhijith commented 4 years ago

@ar3717 How did you fine-tune BERT on your domain-specific dataset and then feed it to SBERT? I guess you used this script ?

ar3717 commented 4 years ago

@cabhijith For SBERT, Iused that script. For fine-tuning BERT, I used this one: https://github.com/huggingface/transformers/blob/v2.9.1/examples/language-modeling/run_language_modeling.py. Are you doing the same thing? If so, could you please let me know what your approach is in case it is different from what I am doing?

threefoldo commented 4 years ago

@cabhijith You can try Milvus (https://github.com/milvus-io/milvus) or Annoy(https://github.com/spotify/annoy).

jobergum commented 3 years ago

https://vespa.ai/ (https://github.com/vespa-engine/vespa) supports fast ANN tensor search/embedding retrieval using HNSW and one can combine regular sparse retrieval with embedding based retrieval in the same query. Our cord19.vespa.ai app uses sentence-bert embeddings for "Related articles" Example https://cord19.vespa.ai/article/58938.

Resources: