facebookresearch / dpr-scale

Scalable training for dense retrieval models.
262 stars 25 forks source link

Some questions about 'generating training queries' #2

Closed jjonhwa closed 2 years ago

jjonhwa commented 2 years ago

To generate our own training data for Lambda, we want to create training queries. At this time, I have a question about the code in 'dpr_scale/utils/prep_wiki_exp.py'.

image Looking at the code above, it can be seen that the query is obtained from passage_sents.

image However, if you look at the source of passage_sents, you can see that it was taken from the document. Am I right to understand?

image In the end, it seems that positive_ctxs stores the passage from passage_sents, and I wonder if query and context are used correctly.

If you have time, please reply. Thank you.

ccsasuke commented 2 years ago

Hi @jjonhwa, thanks for your interest in our work.

As stated in the paper, the training of the Lexical Model is as follows: Query: a random sentence from Wikipedia. Positive Context: The passage where the query comes from.

Therefore, both the queries and contexts come from the "documents" in this case. Does that answer your question?

jjonhwa commented 2 years ago

I see. I'm sorry that I missed that part. Thanks to your detailed explanation, I understood exactly.

Thank you for your reply.