Hannibal046 / xRAG

Source code for xRAG: Extreme Context Compression for Retrieval-augmented Generation with One Token
88 stars 5 forks source link

How to divide enwiki-dec2021 into train and dev #11

Closed BeastyZ closed 2 months ago

BeastyZ commented 2 months ago

Great work!

In the readme, you metioned that enwiki-dec2021 can be used as pretraining data. I wonder how you split enwiki-dec2021 into train and dev sets?

Thank you for your time.

Hannibal046 commented 2 months ago

Hi, the train and dev split is randomly selected.