Hannibal046 / xRAG

Source code for xRAG: Extreme Context Compression for Retrieval-augmented Generation with One Token
88 stars 5 forks source link

How to prepare instruction-tuning dataset? #12

Closed CoolColoury closed 2 months ago

CoolColoury commented 2 months ago

I noticed that the composition of the datasets in the prepare_dataset.ipynb file is different: some datasets have background field and some do not.

At the same time, I found that background is a required field in encode_with_chat_format_finetune when instruction-tuning.

I would like to ask how you deal with it specifically, thanks!

Hannibal046 commented 2 months ago

Hi, for the datasets without natural context, we conduct retrieval and fetch the top-1 doc as the context. image