allenai / sherlock

Code, data, models for the Sherlock corpus
Apache License 2.0
55 stars 7 forks source link

Sherlock

This repo contains code, data, and models for the Sherlock corpus. If you find the paper, corpus, and models interesting or helpful for your own work, please consider citing:

@inproceedings{hesselhwang2022abduction,
  title={{The Abduction of Sherlock Holmes: A Dataset for Visual Abductive Reasoning}},
  author={*Hessel, Jack and *Hwang, Jena D and Park, Jae Sung and Zellers, Rowan and Bhagavatula, Chandra and Rohrbach, Anna and Saenko, Kate and Choi, Yejin},
  booktitle={ECCV},
  year={2022}
}

Dataset Download

We do not publicly release the test set labels, but do have a leaderboard. See the leaderboard section for more detail. In our experience, results on the validation/test sets are quite similar.

What is Sherlock?

We collected a large corpus of abductive inferences over images. Abductive reasoning is the act of reasoning about plausible inferences in the case of uncertainty. Our corpus consists of 363K inferences across 103K images. Each inference is grounded in images via a bounding box. Our model predicts an abductive inference given an image and a bounding box. Example predictions of one of our best best performing models, alongside the human annotations, is given here:

Images Download

The images for Sherlock are sourced from VisualGenome and VCR: if you find the sherlock corpus useful, please cite those works as well! To train a new model or get predictions on the validation/test sets, you will have to download these images locally. Please do not download the images from the URLs contained in the data we release, instead, use:

In addition, we release:

Code

We release several pieces of code:

Pretrained models

We release four pretrained versions of CLIP, fit to the Sherlock corpus. As detailed in the paper, the model is trained using InfoNCE, and augmented to incorporate bounding boxes as input via the bounding box being drawn on the image in pixel space directly. The most performant model is RN50x64-multitask; the fastest model is ViT/B-16.

The checkpoints we release are:

See the demo jupyter notebook for usage, and the leaderboard evaluation code for official evaluation code.

Older version of the dataset

Currently, the Sherlock corpus is in verison 1.1. Verison 1.0 of the train/validation sets can be downloaded here. The models in the paper are trained mostly on the v1 corpora, but we observe very little difference in practice. We recommend using version 1.1 for all cases, unless you are specifically interested in exactly replicating the corpora the model checkpoints were trained on.

License

Sherlock (codebase) is licensed under the Apache License 2.0 (see CODE_LICENSE). Sherlock (dataset) is licensed under CC-BY (see DATASET_LICENSE).