Why is retrieval dependent on only the previous span?

nightlessbaron commented 9 months ago

Hi, why is the input to the retrieval (x + y(t-1)) and not (x + y(<t)) Also, a minor correction in the file name: requirementd.txt -> requirements.txt

AkariAsai commented 9 months ago

We use the original input as well as the previous generation as a retrieval query (an input to the retrieval model) since if we use all previously generated sentences y_{<t}, the retrieved results are more biased towards earlier generations e.g., y_1, which might not be closely related to y_t, especially our generation gets longer. Figure 6 in In-Context Retrieval-Augmented Language Models also reports that once our retrieval queries get longer the model performance starts deteriorating. Note that our decision on whether we should retrieve is based on y<t, as indicated in the Table!

nightlessbaron commented 9 months ago

I see, thanks a lot for sharing this :D

AkariAsai / self-rag

Why is retrieval dependent on only the previous span? #7