FlagOpen / FlagEmbedding

Retrieval and Retrieval-augmented LLMs
MIT License
7.29k stars 529 forks source link

query instruction on ChromaDB #148

Open Bobolx00 opened 1 year ago

Bobolx00 commented 1 year ago

how can i add the query instruction in ChromaDB? (assuming that i use your model as ChromaDB docs explain) should i add it simply as a prefix of the actual query? i have that doubt because for models like 'instructor-xl' they have an instruction argument in their class, but not for generic 'sentence_transformer' models...

thanks in advance!

staoxiao commented 1 year ago

It seems that the ChromaDB does not have a different function for query and corpus. I think the simplest way is adding a prefix to the query. Besides, you also can use bge-*-v1.5, which can retrieve passages without query instruction (you also can add a prefix to achieve a better performance).

winstxnhdw commented 1 year ago

(you also can add a prefix to achieve a better performance).

Dumb question but just to confirm, only the query needs to be prefixed with the instructions right? I do not have to prefix the document texts, yes?

staoxiao commented 1 year ago

(you also can add a prefix to achieve a better performance).

Dumb question but just to confirm, only the query needs to be prefixed with the instructions right? I do not have to prefix the document texts, yes?

Yes. Only add the instruction "Represent this sentence for searching relevant passages: " to query.

manmax31 commented 1 year ago

I got 2 options right during query time of RAG:

  1. Use bge-large-en-v1.5 with instruction: "Represent this sentence for searching relevant passages:"
  2. Use bge-reranker-large

Is this correct?

winstxnhdw commented 1 year ago

I think you’re meant to do 1 and 2 together.

manmax31 commented 1 year ago

How?

i thought For RAG, Embed documents using bge-large-en-v1.5 without any insutrction.

Then retrieve using a query by: Use bge-large-en-v1.5 with instruction: "Represent this sentence for searching relevant passages:" prefixed to the query or bge-reranker-large to do the same?

winstxnhdw commented 1 year ago

Read this article. I believe it’s something like this. https://medium.com/@shaelanderchauhan/unleashing-the-power-of-llm-enhancing-document-retrieval-and-reranking-like-never-before-d05df0af350e

I also read a LlamaIndex article where reranking is used during the chunking step.

Bobolx00 commented 1 year ago

reranking is used during the chunking step

What od you mean with that?