Closed nightlessbaron closed 9 months ago
We use the original input as well as the previous generation as a retrieval query (an input to the retrieval model) since if we use all previously generated sentences y_{<t}
, the retrieved results are more biased towards earlier generations e.g., y_1
, which might not be closely related to y_t
, especially our generation gets longer. Figure 6 in In-Context Retrieval-Augmented Language Models also reports that once our retrieval queries get longer the model performance starts deteriorating.
Note that our decision on whether we should retrieve is based on y<t
, as indicated in the Table!
I see, thanks a lot for sharing this :D
Hi, why is the input to the retrieval (x + y(t-1)) and not (x + y(<t)) Also, a minor correction in the file name:
requirementd.txt
->requirements.txt