Question about the missing data kilt_wikipedia.csv in the project.

OpenMatch / Augmentation-Adapted-Retriever

[ACL 2023] This is the code repo for our ACL'23 paper "Augmentation-Adapted Retriever Improves Generalization of Language Models as Generic Plug-In".

MIT License

58 stars 5 forks source link

Question about the missing data kilt_wikipedia.csv in the project. #4

Closed EachSheep closed 3 months ago

EachSheep commented 8 months ago

Hello, it seems that the corpus data kilt_wikipedia is missing in the data you provided at the link preprocessed data. However, you are referencing this data in post_pipeline.sh and I didn't see the method for creating this data in the document. Could you please provide this missing data?

yuzc19 commented 3 months ago

Hi! Thanks for pointing it out. Due to the large file size, it is not efficient to upload it to the cloud, but I have put the processing script in https://github.com/OpenMatch/Augmentation-Adapted-Retriever/blob/main/tools/process_kilt_wikipedia.py