Closed dranger003 closed 1 year ago
If there is large interest, we can add it to the roadmap and hopefully get some community support.
I don't think it would be very difficult to integrate it in llama.cpp
especially that there is a working version in bert.cpp
@ggerganov where do you think it will fit in the repository? Maybe implementation in common
and usage example in server
?
If I understand correctly, BERT is a model for just computing embeddings - i.e. you cannot use it for text generation, correct?
Why not add it directly in llama.cpp
? We already have API for extracting the embeddings and the embedding
example can be updated to support this if necessary
BERT is a model for just computing embeddings - i.e. you cannot use it for text generation, correct?
Yes, that's true --it's an encoder-only model.
Why not add it directly in llama.cpp? We already have API for extracting the embeddings and the embedding example can be updated to support this if necessary
Great. I was thinking of an embeddings.cpp to support bert and other embedding models such as E5, XLMRoberta etc., but if it's fine to include it directly in llama.cpp I'd be happy to do so.
One common use case with LLMs is to index a collection of documents and retrieve them with embedding similarity in order to feed text relevant documents as context to LLMs. So supporting common embedding models will help with such usage scenarios in the community.
So you can add it to roadmap and assign to me.
@monatis
Created a new issue and will close this one: https://github.com/ggerganov/llama.cpp/issues/2872
I have been using bert.cpp for some time and I must admit the cosine similarity results are quite good. How difficult would it be to integrate the code into llama.cpp now that we have gguf?
I can surely invest some time to make this happen but the code is unfortunately somewhat over my skills. I was able to fix the code from skeskinen so that it works with the latest GGML but I have no idea how to update the GGML conversion script nor any idea how to add BERT support into llama.cpp. Anyone else have any interest to help?
https://github.com/skeskinen/bert.cpp/blob/master/bert.cpp