Open salbatarni opened 5 months ago
Hey! Do you mean during training (e.g. you have multiple positive per query and want to use all of them for training) ?
If so, RAGatouille goes with the most common IR pattern which is that each (query, relevant) pair should be independent, so if you have, say, 5 documents as positives for the same query, you'd create 5 pairs (or triplets), each of them containing just one of the positives and the query.
Okay great! I was wondering how this is being handeled in prepare_training_data
?
So far I am passing the pairs like in the second tutorial.
In the tutorial, the pairs does not contain the query ids, so how its handeled? I am worried that when sampling negative documentes, positive documents will be sampled. Is there anything I am missing?
@bclavie 👀
Hey, I see ragatouille handels many different forms of pairs. But I do not see an example for a query with multiple positive documents. Is it like:
(query, [list of relevant documents])
?