CSAILVision / GazeCapture

Eye Tracking for Everyone
http://gazecapture.csail.mit.edu
Other
954 stars 253 forks source link

Eye Tracking for Everyone Code, Dataset and Models

Introduction

This is the README file for the official code, dataset and model release associated with the 2016 CVPR paper, "Eye Tracking for Everyone".

The dataset release is broken up into three parts:

Continue reading for more information on each part.

History

Any necessary changes to the dataset will be documented here.

Usage

Usage of this dataset (including all data, models, and code) is subject to the associated license, found in LICENSE.md. The license permits the use of released code, dataset and models for research purposes only.

We also ask that you cite the associated paper if you make use of this dataset; following is the BibTeX entry:

@inproceedings{cvpr2016_gazecapture,
Author = {Kyle Krafka and Aditya Khosla and Petr Kellnhofer and Harini Kannan and Suchendra Bhandarkar and Wojciech Matusik and Antonio Torralba},
Title = {Eye Tracking for Everyone},
Year = {2016},
Booktitle = {IEEE Conference on Computer Vision and Pattern Recognition (CVPR)}
}

Data

The dataset can be downloaded at the project website. In the dataset, we include data for 1474 unique subjects. Each numbered directory represents a recording session from one of those subjects. Numbers were assigned sequentially, although some numbers are missing for various reasons (e.g., test recordings, duplicate subjects, or incomplete uploads).

Inside each directory is a collection of sequentially-numbered images (in the frames subdirectory) and JSON files for different pieces of metadata, described below. Many of the variables in the JSON files are arrays, where each element is associated with the frame numbered the same as the index.

In training our iTracker model, we only made use of frames where the subject's device was able to detect both the user's face and eyes using Apple's built-in libraries. Some subjects had no frames with face and eye detections at all. There are 2,445,504 total frames and 1,490,959 with complete Apple detections. For this reason, some frames will be "missing" generated data.

The dataset is split into three pieces, by subject (i.e., recording number): training, validation, and test.

Following is a description of each variable:

appleFace.json, appleLeftEye.json, appleRightEye.json

These files describe bounding boxes around the detected face and eyes, logged at recording time using Apple libraries. "Left eye" refers to the subject's physical left eye, which appears on the right side of the image.

dotInfo.json

faceGrid.json

These values describe the "face grid" input features, which were generated from the Apple face detections. Within a 25 x 25 grid of 0 values, these parameters describe where to draw in a box of 1 values to represent the position and size of the face within the frame.

frames.json

The filenames of the frames in the frames directory. This information may also be generated from a sequence number counting from 0 to TotalFrames - 1 (see info.json).

info.json

motion.json

A stream of motion data (accelerometer, gyroscope, and magnetometer) recorded at 60 Hz, only while frames were being recorded. See Apple's CMDeviceMotion class for a description of the values. DotNum (counting from 0) and Time (in seconds, from the beginning of that dot's recording) are recorded as well.

screen.json

Models

In the models directory, we provide files compatible with Caffe, the deep learning framework. Following are descriptions of the included files:

Code

We provide some sample code to help you get started using the dataset. Below is a high-level overview, but see individual files for more documentation. Most files are MATLAB scripts/functions.

Please feel free to contact us if you find any issues with these scripts or would like to request any additional code.

Contact

Please email any questions or comments to gazecapture@gmail.com.