How to perform the retrieval and comparison tasks on validation set

zfchenUnique commented 2 years ago

Dear authors,

    Thanks for providing this amazing dataset.
    I would like to test the performance of my model on your reasoning benchmark.
    Could you please provide more details on how to test models on the benchmark? How to extrat distractors for retrieval and comparison tasks from https://storage.googleapis.com/ai2-mosaic-public/projects/sherlock/data/sherlock_val_with_split_idxs_v1_1.json.zip? 
    I would like to know how I can get the split index for the retrieval tasks and distractor inferences for the comparison tasks.
    Thanks in advance.

Regards, Zhenfang

jmhessel commented 2 years ago

Hi Zhenfang,

Thanks for your interest in our work! There are details/examples/scripts of how to compute the official leaderboard metrics on the validation set here:

https://github.com/allenai/sherlock/tree/main/leaderboard_eval

The leaderboard formatted data is split on the particular index. For something a bit more lightweight, here's an example of how to get the evaluation metrics for, e.g., mean rank, during training

https://github.com/allenai/sherlock/blob/main/training_code/train_retrieval_clip.py#L526-L536

Jack

zfchenUnique commented 2 years ago

Solved.

allenai / sherlock

How to perform the retrieval and comparison tasks on validation set #2