Open Bobolx00 opened 1 year ago
It seems that the ChromaDB does not have a different function for query and corpus. I think the simplest way is adding a prefix to the query. Besides, you also can use bge-*-v1.5, which can retrieve passages without query instruction (you also can add a prefix to achieve a better performance).
(you also can add a prefix to achieve a better performance).
Dumb question but just to confirm, only the query needs to be prefixed with the instructions right? I do not have to prefix the document texts, yes?
(you also can add a prefix to achieve a better performance).
Dumb question but just to confirm, only the query needs to be prefixed with the instructions right? I do not have to prefix the document texts, yes?
Yes. Only add the instruction "Represent this sentence for searching relevant passages: " to query.
I got 2 options right during query time of RAG:
Is this correct?
I think you’re meant to do 1 and 2 together.
How?
i thought For RAG, Embed documents using bge-large-en-v1.5 without any insutrction.
Then retrieve using a query by: Use bge-large-en-v1.5 with instruction: "Represent this sentence for searching relevant passages:" prefixed to the query or bge-reranker-large to do the same?
Read this article. I believe it’s something like this. https://medium.com/@shaelanderchauhan/unleashing-the-power-of-llm-enhancing-document-retrieval-and-reranking-like-never-before-d05df0af350e
I also read a LlamaIndex article where reranking is used during the chunking step.
reranking is used during the chunking step
What od you mean with that?
how can i add the query instruction in ChromaDB? (assuming that i use your model as ChromaDB docs explain) should i add it simply as a prefix of the actual query? i have that doubt because for models like 'instructor-xl' they have an instruction argument in their class, but not for generic 'sentence_transformer' models...
thanks in advance!