facebookresearch / DPR

Dense Passage Retriever - is a set of tools and models for open domain Q&A task.
Other
1.73k stars 304 forks source link

A possible bug: Wrong hard_neg_ctx_indices #248

Closed zhiyuanpeng closed 1 year ago

zhiyuanpeng commented 1 year ago

Hi,

When I set

batch_size: 4
hard_negatives: 1
other_negatives: 1

The first batch returned by create_biencoder_input:

positive_ctx_indices
[0, 3, 6, 9]
hard_neg_ctx_indices
[[1], [4], [7], [10]]

while from this line, hard_neg_ctx_indices should be [[2], [5], [8], [11]] and [[1], [4], [7], [10]] should be the index of neg_ctxs

all_ctxs = [positive_ctx] + neg_ctxs + hard_neg_ctxs

The reason why this happens is that hard_neg_ctx_indices starts directly from the index of query instead of neg_ctxs:

hard_neg_ctx_indices.append(
                [
                    i
                    for i in range(
                        current_ctxs_len + hard_negatives_start_idx,
                        current_ctxs_len + hard_negatives_end_idx,
                    )
                ]
            )

Is this a bug? Thanks.

zhiyuanpeng commented 1 year ago

find answers in other issues