Ego4D Goal-Step: Toward Hierarchical Understanding of Procedural Activities (NeurIPS 2023)
Yale Song, Eugene Byrne, Tushar Nagarajan, Huiyu Wang, Miguel Martin, Lorenzo Torresani
[OpenReview] [Data visualization] [EvalAI test server for step grounding]
You can get the data directly from this repository, or get them via Ego4D CLI
# download goalstep videos & annotations
ego4d --datasets full_scale annotations --benchmark goalstep -o <out-dir>
# download goalstep annotations
ego4d --datasets annotations --benchmark goalstep -o <out-dir>
# download goalstep videos
ego4d --datasets full_scale --benchmark goalstep -o <out-dir>
We provide visualization of goal-step annotations via Ego4D Visualizer. Simply select goalstep
under Annotations and you'll see a timeline view of goals, steps, and substeps with a time marker synchronized with the video player.
Goal-step provides hierarchical annotations of procedural human activities in three distinct levels: goals -- steps -- substeps. These annotations are organized in a nested manner shown below:
{
"video_uid": "9b58e3ab-7b6d-4e79-9eea-c21420b0eedc",
"start_time": 0.0210286458333333,
"end_time": 510.1876953125,
"goal_category": "COOKING:MAKE_OMELET",
"goal_description": "Make omelette",
"goal_wikihow_url": "https://www.wikihow.com/Cook-a-Basic-Omelette",
"summary": [
"Toasting bread on a pan",
"Making omelet",
"Serving omelet with ketchup"
],
"is_procedural": true,
"segments": [
{
"start_time": 0,
"end_time": 56.99209,
"step_category": "General cooking activity: Toast bread",
"step_description": "Toast bread",
"is_continued": false,
"is_procedural": true,
"is_relevant": "essential",
"summary": [
"heat skillet",
"toast bread",
"trash kitchen waste"
],
"segments": [
{
"start_time": 0,
"end_time": 13.135,
"step_category": "Cook on a stovetop: Turn on the stovetop",
"step_description": "preheat the stove-top",
"is_continued": false,
"is_procedural": true,
"is_relevant": "essential",
"summary": [
"turn on stove",
"preheat the stove-top"
]
},
...
]
},
...
]
}
We provide instructions to run the baselines in the paper to reproduce main results from Table 2. Specifically, in step_grounding/README.md we provide instructions to set up the VSLNet baseline for the step grounding task using the Narrations-as-Queries (NaQ) codebase.
Ego4D Goal-Step is licensed under the MIT License.