FlagOpen / FlagEmbedding

Retrieval and Retrieval-augmented LLMs
MIT License
6.91k stars 500 forks source link

finetune instruction for bge rerank #693

Open QuangTQV opened 5 months ago

QuangTQV commented 5 months ago

Can BGE rerank be finetuned in an instruction style, for example: "tool": { "query": "Transform this user request for fetching helpful tool descriptions: ", "key": "Transform this tool description for retrieval: " }

staoxiao commented 5 months ago

Yes. You need to add this instruction to your data before fine-tuning.

QuangTQV commented 5 months ago

Yes. You need to add this instruction to your data before fine-tuning.

I don't know how to add instructions, and I've also searched for documentation on finetuning BGE rerank but couldn't find any cases with instructions. Can you guide me? I want to rerank for retrieving tools.

staoxiao commented 5 months ago

Hi, @Lahaina936 , if you just want to reranking tools, you can fine-tune using your data without adding instruction. Instruction is only useful when the model needs to perform multiple different tasks. You can add the instruction to the query when generating fine-tuning data {"query": "Find a helpful tool to address the user's issue: xxxxxxx", "pos": List[str], "neg":List[str]}

QuangTQV commented 5 months ago

Hi, @Lahaina936 , if you just want to reranking tools, you can fine-tune using your data without adding instruction. Instruction is only useful when the model needs to perform multiple different tasks. You can add the instruction to the query when generating fine-tuning data {"query": "Find a helpful tool to address the user's issue: xxxxxxx", "pos": List[str], "neg":List[str]}

So if I want to finetune BGE rerank for multiple different tasks like llm-embedder, how should I 'finetune instruction'? Thank you and looking forward to your response.

staoxiao commented 5 months ago

Add different instructions to queries from different tasks. You should organize your data likes {"query": "task-specific instruction: task query", "pos": List[str], "neg":List[str]}