facebookresearch / ego4d-goalstep

Ego4D Goal-Step: Toward Hierarchical Understanding of Procedural Activities (NeurIPS 2023)
MIT License
38 stars 0 forks source link

Ego4D Goal-Step

Ego4D Goal-Step: Toward Hierarchical Understanding of Procedural Activities (NeurIPS 2023)
Yale Song, Eugene Byrne, Tushar Nagarajan, Huiyu Wang, Miguel Martin, Lorenzo Torresani
[OpenReview] [Data visualization] [EvalAI test server for step grounding]

Download

You can get the data directly from this repository, or get them via Ego4D CLI

# download goalstep videos & annotations
ego4d --datasets full_scale annotations --benchmark goalstep -o <out-dir>
# download goalstep annotations
ego4d --datasets annotations --benchmark goalstep -o <out-dir>
# download goalstep videos
ego4d --datasets full_scale --benchmark goalstep -o <out-dir>

Visualization

We provide visualization of goal-step annotations via Ego4D Visualizer. Simply select goalstep under Annotations and you'll see a timeline view of goals, steps, and substeps with a time marker synchronized with the video player.

Annotation Format

Goal-step provides hierarchical annotations of procedural human activities in three distinct levels: goals -- steps -- substeps. These annotations are organized in a nested manner shown below:

{
  "video_uid": "9b58e3ab-7b6d-4e79-9eea-c21420b0eedc",
  "start_time": 0.0210286458333333,
  "end_time": 510.1876953125,
  "goal_category": "COOKING:MAKE_OMELET",
  "goal_description": "Make omelette",
  "goal_wikihow_url": "https://www.wikihow.com/Cook-a-Basic-Omelette",
  "summary": [
    "Toasting bread on a pan",
    "Making omelet",
    "Serving omelet with ketchup"
  ],      
  "is_procedural": true,
  "segments": [
    {     
      "start_time": 0,
      "end_time": 56.99209,
      "step_category": "General cooking activity: Toast bread",
      "step_description": "Toast bread",
      "is_continued": false,
      "is_procedural": true,
      "is_relevant": "essential",
      "summary": [
        "heat skillet",
        "toast bread",
        "trash kitchen waste"
      ],  
      "segments": [
        {
          "start_time": 0,
          "end_time": 13.135,
          "step_category": "Cook on a stovetop: Turn on the stovetop",
          "step_description": "preheat the stove-top",
          "is_continued": false,
          "is_procedural": true,
          "is_relevant": "essential",
          "summary": [
            "turn on stove",
            "preheat the stove-top"
          ]
        },
        ...
      ]
    },
    ...
  ]
}

Baseline starter code

We provide instructions to run the baselines in the paper to reproduce main results from Table 2. Specifically, in step_grounding/README.md we provide instructions to set up the VSLNet baseline for the step grounding task using the Narrations-as-Queries (NaQ) codebase.

License

Ego4D Goal-Step is licensed under the MIT License.