This repository hosts the code related to the following papers:
Antonino Furnari and Giovanni Maria Farinella, Rolling-Unrolling LSTMs for Action Anticipation from First-Person Video. IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI). 2020. Download
Antonino Furnari and Giovanni Maria Farinella, What Would You Expect? Anticipating Egocentric Actions with Rolling-Unrolling LSTMs and Modality Attention. International Conference on Computer Vision, 2019. Download
Please also see the project web page at http://iplab.dmi.unict.it/rulstm.
If you use the code/models hosted in this repository, please cite the following papers:
@article{furnari2020rulstm,
author = {Antonino Furnari and Giovanni Maria Farinella},
journal = {IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI)},
title = {Rolling-Unrolling LSTMs for Action Anticipation from First-Person Video},
year = {2020}
}
@inproceedings{furnari2019rulstm,
title = { What Would You Expect? Anticipating Egocentric Actions with Rolling-Unrolling LSTMs and Modality Attention. },
author = { Antonino Furnari and Giovanni Maria Farinella },
year = { 2019 },
booktitle = { International Conference on Computer Vision (ICCV) },
}
download_data_full.sh
script rather than download_data.sh
;FEATEXT
and are documented in this README. This repository provides the following components:
Please, refer to the paper for more technical details. The following sections document the released material.
The provided implementation and training/validation/test program can be found in the RULSTM
directory. In order to proceed to training, it is necessary to retrieve the pre-extracted features from our website. To save space and bandwidth, we provide features extracted only on the subset of frames used for the experiments (we sampled frames at about 4fps - please see the paper). These features are sufficient to train/validate/test the methods on the whole EPIC-KITCHENS-55 dataset following the settings reported in the paper.
To run the code, you will need a Python3 interpreter and some libraries (including PyTorch).
An Anaconda environment file with a minimal set of requirements is provided in environment.yml
. If you are using Anaconda, you can create a suitable environment with:
conda env create -f environment.yml
To activate the environment, type:
conda activate rulstm
If you are not using Anaconda, we provide a list of libraries in requirements.txt
. You can install these libraries with:
pip install -r requirements.txt
We provide CSVs for training/validation/and testing on EPIC-KITCHENS-55 in the data/ek55
directory. A brief description of each csv follows:
actions.csv
: maps action ids to (verb,noun) pairs;EPIC_many_shot_nouns.csv
: contains the list of many shot nouns for class-aware metrics (please refer to the EPIC-KITCHENS-55 paper for more details);EPIC_many_shot_verbs.csv
: similar to the previous one, but related to verbs;test_seen.csv
: contains the timestamps (expressed in number of frames) of the "seen" test set (S1);test_unseen.csv
: contains the timestamps (expressed in number of frames) of the "unseen" test set (S2);training.csv
: contains annotations for the training set in our training/validation split;validation.csv
: contains annotations for the validation set in our training/validation split;training_videos.csv
: contains the list of training videos in our training/validation split;validation_videos.csv
: contains the list of validation videos in our training/validation split;
We also provide CSVs for training/validation/testing on EPIC-KITCHENS-100 in the data/ek100
directory. Training and validation CSVs report the following columns:
The test CSVs do not report the last three columns since test annotations are not public. These CSVs are provided to allow producing predicitons in JSON format to be submitted to the leaderboard.
Please note that time-stamps are reported in terms of frame numbers in the csvs. This has been done by assuming a fixed framerate of 30fps. Since the original videos have been collected a different framerates, we first converted all videos to 30fps using ffmpeg.
We provide pre-extracted features. The features are stored in LMDB datasets. To download them, run the following commands:
./scripts/download_data_ek55.sh
;Alternatively, you can download features extracted from each frame by using the script:
./scripts/download_data_ek55_full.sh
;./scripts/download_data_ek100_full.sh
;Please note that this download is significantly heavier and that it is not required to run the training with default parameters on EPIC-KITCHENS-55.
This should populate three directories data/ek{55|100}/rgb
, data/ek{55|100}/flow
, data/ek{55|100}/obj
with the LMDB datasets.
Models can be trained using the main.py
program. For instance, to train the RGB branch for the action anticipation task, use the following commands:
mkdir models/
python main.py train data/ek55 models/ek55 --modality rgb --task anticipation --sequence_completion
python main.py train data/ek55 models/ek55 --modality rgb --task anticipation
mkdir models/
python main.py train data/ek100 models/ek100 --modality rgb --task anticipation --sequence_completion --num_class 3806 --mt5r
python main.py train data/ek100 models/ek100 --modality rgb --task anticipation --num_class 3806 --mt5r
This will first pre-train using sequence completion, then fine-tune to the main anticipation task. All models will be stored in the models/ek{55|100}
directory.
Optionally, a --visdom
flag can be passed to the training program in order to enable loggin using visdom. To allow this, it is necessary to install visdom with:
pip install visdom
And run it with:
python -m visdom.server
Similar commands can be used to train all models. The following scripts contain all commands required to train the models for egocentric action anticipation and early action recognition:
scripts/train_anticipation_ek{55|100}.sh
;scripts/train_recognition_ek55.sh
.The anticipation models can be validated using the following commands:
python main.py validate data/ek55 models/ek55 --modality rgb --task anticipation
;python main.py validate data/ek55 models/ek55 --modality flow --task anticipation
;python main.py validate data/ek55 models/ek55 --modality obj --task anticipation --feat_in 352
;python main.py validate data/ek55 models/ek55 --modality fusion --task anticipation
.python main.py validate data/ek100 models/ek100 --modality rgb --task anticipation --num_class 3806 --mt5r -ek100
;python main.py validate data/ek100 models/ek100 --modality flow --task anticipation --num_class 3806 --mt5r -ek100
;python main.py validate data/ek100 models/ek100 --modality obj --task anticipation --feat_in 352 --num_class 3806 --mt5r -ek100
;python main.py validate data/ek100 models/ek100 --modality fusion --task anticipation --num_class 3806 --mt5r -ek100
.These instructions will evaluate the models using the official measures of the EPIC-KITCHENS-100 dataset for the action anticipation challenge.
You can produce validation jsons as follows:
mkdir -p jsons/ek100
;python main.py validate_json data/ek100 models/ek100 --modality fusion --task anticipation --json_directory jsons/ek100 --ek100 --num_class 3806 --mt5r
;python main.py validate_json data/ek100 models/ek100 --modality fusion --task early_recognition --json_directory jsons/ek100 -ek100 --num_class 3806 --mt5r
.Similarly, for early action recognition:
python main.py validate data models --modality rgb --task early_recognition
;python main.py validate data models --modality flow --task early_recognition
;python main.py validate data models --modality obj --task early_recognition --feat_in 352
;python main.py validate data models --modality fusion --task early_recognition
.The main.py
program also allows to run the models on the EPIC-KITCHENS-55 and EPIC-KITCHENS-100 test sets and produce jsons to be sent to the leaderboard (see http://epic-kitchens.github.io/). To test models, you can use the following commands:
mkdir -p jsons/ek55
;python main.py test data/ek55 models/ek55 --modality fusion --task anticipation --json_directory jsons/ek55
;python main.py test data/ek55 models/ek55 --modality fusion --task early_recognition --json_directory jsons/ek55
.mkdir -p jsons/ek100
;python main.py test data/ek100 models/ek100 --modality fusion --task anticipation --json_directory jsons/ek100 --ek100 --num_class 3806 --mt5r
;python main.py test data/ek100 models/ek100 --modality fusion --task early_recognition --json_directory jsons/ek100 -ek100 --num_class 3806 --mt5r
.We provide the official checkpoints used to report the results on EPIC-KITCHENS-55 in our ICCV paper. These can be downloaded using the script:
./script/download_models_ek55.sh
The models will be downloaded in models/ek55
. You can test the model and obtain the results reported in the paper using the same main.py
program. For instance:
python main.py test data/ek55 models/ek55 --modality fusion --task anticipation --json_directory jsons
We provide the checkpoints used to report the results in the EPIC-KITCHENS-100 paper (https://arxiv.org/abs/2006.13256). These can be downloaded using the script:
./script/download_models_ek100.sh
The models will be downloaded in models/ek100
. You can produce the validation and test jsons replicating the results of the paper as follows:
python main.py test data/ek100 models/ek100 --modality fusion --task anticipation --json_directory jsons --ek100 --mt5r
python main.py validate_json data/ek100 models/ek100 --modality fusion --task anticipation --json_directory jsons --ek100 --mt5r
Can be downloaded from the following URLs:
http://iplab.dmi.unict.it/sharing/rulstm/TSN-rgb.pth.tar
;http://iplab.dmi.unict.it/sharing/rulstm/TSN-flow.pth.tar
.http://iplab.dmi.unict.it/sharing/rulstm/TSN-rgb-ek100.pth.tar
;http://iplab.dmi.unict.it/sharing/rulstm/TSN-flow-ek100.pth.tar
.We release the Faster-RCNN object detector trained on EPIC-KITCHENS-55 that we used for our experiments. The detector has been trained using the detectron library. The yaml
configuration file used to train the model is available in the FasterRCNN
directory of this repository. The weights can be downloaded from this link.
Make sure the detectron library is installed and available in the system path. A good idea might be to use a docker container. Please refer to https://github.com/facebookresearch/Detectron/blob/master/INSTALL.md for more details.
Sample usage:
git clone https://github.com/antoninofurnari/rulstm.git
;cd rulstm/FasterRCNN/
;curl -o weights/ek18-2gpu-e2e-faster-rcnn-R-101-FPN_1x.pkl http://iplab.dmi.unict.it/rulstm/downloads/ek18-2gpu-e2e-faster-rcnn-R-101-FPN_1x.pkl
;python tools/detect_video.py --cfg config/ek18-2gpu-e2e-faster-rcnn-R-101-FPN_1x.yaml --wts weights/ek18-2gpu-e2e-faster-rcnn-R-101-FPN_1x.pkl path/to/video.mp4
.A new file path/to/video.mp4_detections.npy
will be created. The file will contain a list of arrays reporting the coordinates of the objects detected in each frame of the video. Specifically, the detections of a given frame will be contained in a tensor of shape N x 6
, where:
N
is the number of object detected in the frame;[xmin, ymin, xmax, ymax]
.
Please refer to https://github.com/epic-kitchens/annotations/blob/master/EPIC_noun_classes.csv for the list of object ids.A few example scripts showing how we performed feature extraction from video, can be found in the FEATEXT
directory.
To extract features using the TSN models, it is necessary to install the pretrainedmodels
package through pip install pretrainedmodels
.
To run the examples follow these steps:
cd FEATEXT
;./scripts/download_models.sh
;tar xvf data.tar
. This will extract a few files in the data
folder. These include:
ffmpeg
;detect_video.py
script in FasterRCNN/tools
;mkdir features
;python extract_sample_rgb.py
;python extract_sample_flow.py
;python extract_sample_obj.py
;features
;We provide the EGTEA Gaze+ features used for the experiments (see paper for the details) at https://iplab.dmi.unict.it/sharing/rulstm/features/egtea.zip. The features have been extracted using three different TSN models trained following the official splits proposed by the authors of EGTEA Gaze+ (see http://cbs.ic.gatech.edu/fpv/). The annotations formatted in a way to be directly usable with this repository can be found in RULSTM/data/egtea
.
Note: a previous version of the zip file contained the following LMDB databases:
TSN-C_3_egtea_action_CE_flow_model_best_fcfull_hd
;TSN-C_3_egtea_action_CE_rgb_model_best_fcfull_hd
; TSN-C_3_egtea_action_CE_s1_rgb_model_best_fcfull_hd
;TSN-C_3_egtea_action_CE_s1_flow_model_best_fcfull_hd
; TSN-C_3_egtea_action_CE_s2_rgb_model_best_fcfull_hd
;TSN-C_3_egtea_action_CE_s2_flow_model_best_fcfull_hd
;TSN-C_3_egtea_action_CE_s3_rgb_model_best_fcfull_hd
;TSN-C_3_egtea_action_CE_s3_flow_model_best_fcfull_hd
.The first two databases had been included by mistake and should be ignored, instead, the remaining six databases should be used for the experiments when the standard evaluation protocol based on three splits is adopted. The following paragraph explains in detail how they have been created:
TSN-C_3_egtea_action_CE_s1_rgb_model_best_fcfull_hd
: features extracted using an RGB TSN model training using s2 and s3 as training set;TSN-C_3_egtea_action_CE_s1_flow_model_best_fcfull_hd
: features extracted using a Flow TSN model training using s2 and s3 as training set; TSN-C_3_egtea_action_CE_s2_rgb_model_best_fcfull_hd
: features extracted using an RGB TSN model training using s1 and s3 as training set;TSN-C_3_egtea_action_CE_s2_flow_model_best_fcfull_hd
: features extracted using a Flow TSN model training using s1 and s3 as training set;TSN-C_3_egtea_action_CE_s3_rgb_model_best_fcfull_hd
: features extracted using an RGB TSN model training using s1 and s2 as training set;TSN-C_3_egtea_action_CE_s3_flow_model_best_fcfull_hd
: features extracted using a Flow TSN model training using s1 and s2 as training set.An updated version of the zip file including only the correct databases is available at https://iplab.dmi.unict.it/sharing/rulstm/features/egtea.zip.
We provide object detections obtained on each frame of EPIC-KITCHENS-100. The detections have been obtained by running the Faster RCNN model trained on EPIC-KITCHENS-55 described above and included in this repository. You can download a zip file containing all detections through this link: https://iplab.dmi.unict.it/sharing/rulstm/detected_objects.zip.
Note These detections are a superset of the ones used for the original experiments on EPIC-KITCHENS-55. If you are experimenting with EK-55, you can just discard the extra videos not belonging to EK-55.
The zip file contains a npy
file for each video in EPIC-KITCHENS-100. For examle:
P01_01.MP4_detections.npy
P01_02.MP4_detections.npy
P01_03.MP4_detections.npy
P01_04.MP4_detections.npy
P01_05.MP4_detections.npy
P01_06.MP4_detections.npy
...
Each file contains all object detections obtained in the video referenced in the filename. You can load these npy
files as in this example code:
import numpy as np
data=np.load('P04_101.MP4_detections.npy', allow_pickle=True, encoding='latin1')
data
will be a 1-dimensional numpy ndarray containing n
entries, where n
is the number of frames in the video. The n-th
entry of the dataframe will be an array of shape m \times 6
where, m
is the number of objects detected in the frame. The six columns contain respectively:
0
, so it is necessary to subtract 1
in order to match the noun class IDs reported in https://github.com/epic-kitchens/epic-kitchens-55-annotations/blob/master/EPIC_noun_classes.csv;x1
, y1
, x2
, y2
bounding box coordinates;The following example code separates class ids, box coordinates and confidence scores:
object_classes = data[:,0]-1
object_boxes = data[:,1:5]
detection_scores = data[:,-1]