abetlen / llama-cpp-python

Python bindings for llama.cpp
https://llama-cpp-python.readthedocs.io
MIT License
8.16k stars 970 forks source link

Add reranking support #1794

Open donguyen32 opened 1 month ago

donguyen32 commented 1 month ago

According to the https://github.com/ggerganov/llama.cpp/pull/9510, lllama-cpp supported for reranking model https://huggingface.co/BAAI/bge-reranker-v2-m3. Please provide support for this version.

donguyen32 commented 1 month ago

@abetlen Sorry but do you have any plans to implement this?

yutyan0119 commented 2 weeks ago

Hi @donguyen32 I am thinking the same thing and have just submitted a PR to add the rank method to High-Level API. I don't know if it will be merged or not, but it would be helpful to know how to do the ranking using llama-cpp-python.

donguyen32 commented 2 weeks ago

@yutyan0119 Arcoding from the original repo, I see that the format of the rerank task is [BOS]query[EOS][SEP]doc[EOS] https://github.com/ggerganov/llama.cpp/blob/9f409893519b4a6def46ef80cd6f5d05ac0fb157/examples/server/utils.hpp#L185-L196 your inputs are[f"{query}</s><s>{doc}" for doc in documents] Please check it

yutyan0119 commented 2 weeks ago

@donguyen32 Thanks for your comment! Actually, I was looking at examples/embedding/embedding.cpp, so I think there are some differences from the server implementation.

I have verified that the output is the same as the original implementation with the following command

./llama-embedding \
    -m models/bge-reranker-v2-m3/ggml-model-f16.gguf \
    -p "what is panda?</s><s>hi\nwhat is panda?</s><s>it's a bear\nwhat is panda?</s><s>The giant panda (Ailuropoda melanoleuca), sometimes called a panda bear or simply panda, is a bear species endemic to China." \
    --pooling rank --embd-normalize -1 --verbose-prompt 

The same command also seems to be used for testing on CI. https://github.com/ggerganov/llama.cpp/blob/a9e8a9a0306a8093eef93b0022d9f45510490072/ci/run.sh#L755

In fact, I do not know how these symbols affect the accuracy of Rerank. If you know, please let me know.

And if we want a return value in the form of the server, I think it would be better to have a separate method in the form of a create_embedding method for embed, like create_rank.