Questions for the score

szhang42 commented 3 years ago

Hello:

Thanks for this nice work! I have two questions:

In the test reader, it would output the prediction_results_file in a json file as below. There is a score and a relevant score. From my understanding, the score represents the answer span score and the relevant score is the passage score. For the passage score, I assume the below is the definition for the relevant score from reader.py. Then my question is why we set the true switch label as 0 for this?

# compute switch lossrelevance_logits = relevance_logits.view(N, M) switch_labels = torch.zeros(N, dtype=torch.long).cuda() switch_loss = torch.sum(loss_fct(relevance_logits, switch_labels))

reader

In the provided nq test retrieval checkpoint, such as nq-test-dense-results.json as below. How did DPR define this score highlighted below? Thanks very much!

retriever_new

vlad-karpukhin commented 3 years ago

Hi,

Yes your output scores understanding is correct. Why do we put 0 for the switch label? You need to closely look at batch dimensions. We don't put 'switch label as 0' but put correct class index to be 0 (out of M possible classes) since our positive passage is always placed first for each question.
That depends on which search index you use. If it is FlatIP FAISS index - the score is a dot product between query and passages embeddings. If you use HNSW FAISS index - the score is actually L2 euclidean distance in the transformed Dot product->L2 space.

Hope this clarifies your understanding.

szhang42 commented 3 years ago

Hello:

Thanks very much! It is very clear now!

facebookresearch / DPR

Questions for the score #70