facebookresearch / DPR

Dense Passage Retriever - is a set of tools and models for open domain Q&A task.
Other
1.73k stars 304 forks source link

Hard Negative Indices Issue #212

Open khuangaf opened 2 years ago

khuangaf commented 2 years ago

In https://github.com/facebookresearch/DPR/blob/main/dpr/models/biencoder.py#L194-L196, hard_negatives_start_idx is set to 1, which seems to assume that len(neg_ctxs) == 0? In order for hard_neg_ctx_indices to index the hard negative passages in all_ctxs ( https://github.com/facebookresearch/DPR/blob/main/dpr/models/biencoder.py#L207-L215), shouldn't we do

hard_negatives_start_idx = len(neg_ctxs) + 1
hard_negatives_end_idx = len(neg_ctxs)+ len(hard_neg_ctxs) + 1

? Or perhaps I misunderstood the code here?

vladk232 commented 2 years ago

Hi @khuangaf , this codebase is no longer supported, sorry.

zmzhang2000 commented 2 years ago

In https://github.com/facebookresearch/DPR/blob/main/dpr/models/biencoder.py#L194-L196, hard_negatives_start_idx is set to 1, which seems to assume that len(neg_ctxs) == 0? In order for hard_neg_ctx_indices to index the hard negative passages in all_ctxs ( https://github.com/facebookresearch/DPR/blob/main/dpr/models/biencoder.py#L207-L215), shouldn't we do

hard_negatives_start_idx = len(neg_ctxs) + 1
hard_negatives_end_idx = len(neg_ctxs)+ len(hard_neg_ctxs) + 1

? Or perhaps I misunderstood the code here?

Hi, I'm just puzzled by the same problem and I think you are right.

See #181, #176 and #92 for reference.

vladk232 commented 2 years ago

The logic with hard negatives indexes is indeed incorrect in the mentioned lines. BUT, there is no usage of them in any downstream code. It was added for an experiment which was later deleted from the final code while hard_negatives indexes are still there but play no role.