Open matthmeyer opened 9 months ago
Both passages
and queries
in data_preprocess(file)
append new items at the same time when the input query differs from the previous line. Thus they should have the same length.
The role of tmp_psgs
is to collect all retrieved passages that are retrieved with the same single query and will only be appended when the current query changes (q != queries[-1]
).
The function
data_preprocess(file)
in CRAG_Inference should produce an passages array of same length of queries array. However while testing with the Popqa dataset, I realized that the passages array is much longer than the queries array.The reason is a wrong indentation.
tmp_psgs
is appended topassages
after every line in the preprocessed file. However,tmp_psgs
should only be appended if the query is different from last line's query or at the end of looping through the lines. A different indentation fixes the bug to the intended behavior.