Retrieval corpus as reference documents

Hannibal046 / xRAG

Source code for xRAG: Extreme Context Compression for Retrieval-augmented Generation with One Token

88 stars 5 forks source link

Retrieval corpus as reference documents #7

Closed Bala93 closed 4 months ago

Bala93 commented 4 months ago

Congratulations for the great work.

I have the following doubt: The reference document/representation retrieved is only based on the chunks from wikipedia or does it also include the document references from each of the datasets? For example, hotpotqa could benefit more from retrieving the relevant documents from their dataset compared to wikipedia dump. Am I missing something. ?

Thanks for your time.

Hannibal046 commented 4 months ago

Hi, thanks for the great question and bringing the point that retrieving the relevant documents from corresponding dataset would further improve the performance! Currently for xRAGv1, the retrieval document is solely based on the wikipedia dump.

Bala93 commented 4 months ago

Thanks for the clarification.