ggerganov / llama.cpp

LLM inference in C/C++
MIT License
67.97k stars 9.75k forks source link

Sentence-BERT support for sentence similarity embeddings #2863

Closed dranger003 closed 1 year ago

dranger003 commented 1 year ago

I have been using bert.cpp for some time and I must admit the cosine similarity results are quite good. How difficult would it be to integrate the code into llama.cpp now that we have gguf?

I can surely invest some time to make this happen but the code is unfortunately somewhat over my skills. I was able to fix the code from skeskinen so that it works with the latest GGML but I have no idea how to update the GGML conversion script nor any idea how to add BERT support into llama.cpp. Anyone else have any interest to help?

https://github.com/skeskinen/bert.cpp/blob/master/bert.cpp

ggerganov commented 1 year ago

If there is large interest, we can add it to the roadmap and hopefully get some community support. I don't think it would be very difficult to integrate it in llama.cpp especially that there is a working version in bert.cpp

monatis commented 1 year ago

@ggerganov where do you think it will fit in the repository? Maybe implementation in common and usage example in server?

ggerganov commented 1 year ago

If I understand correctly, BERT is a model for just computing embeddings - i.e. you cannot use it for text generation, correct?

Why not add it directly in llama.cpp? We already have API for extracting the embeddings and the embedding example can be updated to support this if necessary

monatis commented 1 year ago

BERT is a model for just computing embeddings - i.e. you cannot use it for text generation, correct?

Yes, that's true --it's an encoder-only model.

Why not add it directly in llama.cpp? We already have API for extracting the embeddings and the embedding example can be updated to support this if necessary

Great. I was thinking of an embeddings.cpp to support bert and other embedding models such as E5, XLMRoberta etc., but if it's fine to include it directly in llama.cpp I'd be happy to do so.

One common use case with LLMs is to index a collection of documents and retrieve them with embedding similarity in order to feed text relevant documents as context to LLMs. So supporting common embedding models will help with such usage scenarios in the community.

So you can add it to roadmap and assign to me.

ggerganov commented 1 year ago

@monatis

Created a new issue and will close this one: https://github.com/ggerganov/llama.cpp/issues/2872