facebookresearch / DPR

Dense Passage Retriever - is a set of tools and models for open domain Q&A task.
Other
1.69k stars 298 forks source link

Questions for the score #70

Closed szhang42 closed 3 years ago

szhang42 commented 3 years ago

Hello:

Thanks for this nice work! I have two questions:

  1. In the test reader, it would output the prediction_results_file in a json file as below. There is a score and a relevant score. From my understanding, the score represents the answer span score and the relevant score is the passage score. For the passage score, I assume the below is the definition for the relevant score from reader.py. Then my question is why we set the true switch label as 0 for this? 

# compute switch lossrelevance_logits = relevance_logits.view(N, M) switch_labels = torch.zeros(N, dtype=torch.long).cuda() switch_loss = torch.sum(loss_fct(relevance_logits, switch_labels))

reader

  1. In the provided nq test retrieval checkpoint, such as nq-test-dense-results.json as below. How did DPR define this score highlighted below? Thanks very much!

retriever_new

vlad-karpukhin commented 3 years ago

Hi,

  1. Yes your output scores understanding is correct. Why do we put 0 for the switch label? You need to closely look at batch dimensions. We don't put 'switch label as 0' but put correct class index to be 0 (out of M possible classes) since our positive passage is always placed first for each question.

  2. That depends on which search index you use. If it is FlatIP FAISS index - the score is a dot product between query and passages embeddings. If you use HNSW FAISS index - the score is actually L2 euclidean distance in the transformed Dot product->L2 space.

Hope this clarifies your understanding.

szhang42 commented 3 years ago

Hello:

Thanks very much! It is very clear now!