recall/precision computation

Implementation of evidence set aware recall/precision. A model should produce a list of page ids. The core idea is to group page ids that belongs to the same evidence set (consider them as 1 in the rank). Pages that are not evidence counts as 1 as well.

test:

python kilt/eval_retrieval.py /checkpoint/fabiopetroni/KILT/predictions/DPR_nicola/nq-dev-kilt.jsonl /checkpoint/fabiopetroni/KILT/datasets/nq-dev-kilt.jsonl