Open donguyen32 opened 1 month ago
@abetlen Sorry but do you have any plans to implement this?
Hi @donguyen32 I am thinking the same thing and have just submitted a PR to add the rank method to High-Level API. I don't know if it will be merged or not, but it would be helpful to know how to do the ranking using llama-cpp-python.
@yutyan0119 Arcoding from the original repo, I see that the format of the rerank task is [BOS]query[EOS][SEP]doc[EOS]
https://github.com/ggerganov/llama.cpp/blob/9f409893519b4a6def46ef80cd6f5d05ac0fb157/examples/server/utils.hpp#L185-L196
your inputs are[f"{query}</s><s>{doc}" for doc in documents]
Please check it
@donguyen32 Thanks for your comment! Actually, I was looking at examples/embedding/embedding.cpp, so I think there are some differences from the server implementation.
I have verified that the output is the same as the original implementation with the following command
./llama-embedding \
-m models/bge-reranker-v2-m3/ggml-model-f16.gguf \
-p "what is panda?</s><s>hi\nwhat is panda?</s><s>it's a bear\nwhat is panda?</s><s>The giant panda (Ailuropoda melanoleuca), sometimes called a panda bear or simply panda, is a bear species endemic to China." \
--pooling rank --embd-normalize -1 --verbose-prompt
The same command also seems to be used for testing on CI. https://github.com/ggerganov/llama.cpp/blob/a9e8a9a0306a8093eef93b0022d9f45510490072/ci/run.sh#L755
In fact, I do not know how these symbols affect the accuracy of Rerank. If you know, please let me know.
And if we want a return value in the form of the server, I think it would be better to have a separate method in the form of a create_embedding method for embed, like create_rank
.
According to the https://github.com/ggerganov/llama.cpp/pull/9510, lllama-cpp supported for reranking model https://huggingface.co/BAAI/bge-reranker-v2-m3. Please provide support for this version.