Luodian / Otter

🦦 Otter, a multi-modal model based on OpenFlamingo (open-sourced version of DeepMind's Flamingo), trained on MIMIC-IT and showcasing improved instruction-following and in-context learning ability.
https://otter-ntu.github.io/
MIT License
3.55k stars 242 forks source link

will MIMIC-IT Eggs provide frames for Ego4d? #227

Closed Maxlinn closed 1 year ago

Maxlinn commented 1 year ago

hi to the team, thanks for open-sourcing MIMIC-IT!

when i was inspecting MIMIC-IT Eggs, i noticed there is no Egg(that is, pre-extracted frames) for Ego4D instructions. Ego4D is a rather huge dataset that may takes 7 terabytes of storage and several days to download. so i'm wondering

  1. will MIMIC-IT Eggs provide Egg for EgoD instructions?
  2. if not, does the frames are extracted in one-frame-per-second? (i may manually extract from a subset of Ego4D)

thanks for your kind help!

Luodian commented 1 year ago

I think we have provided the E4D_instructions.json in our released onedrive folder.

As for the E4D.json, it's around 300GB+ so we dont have a good way to release it with Onedrive. We may release it on HF later.

Luodian commented 1 year ago

does the frames are extracted in one-frame-per-second?

Yes it's extracted with 1FPS from E4D videos.

Luodian commented 1 year ago

If you downloaded the E4D dataset from the official release. You may see similar structures

ego4d_data/v2/full_scale/

Inside the folder there are plenty of videos. We extract them by 1FPS to store them with a key-value pair in a E4D.json. The key is E4D_IMG_{video_name}_{frame_number}, for example E4D_IMG_01111831-9107-43c4-bf0e-6b26e9b32a2b_00000000. The value is corresponding encoded base64 format image.

Luodian commented 1 year ago

@pufanyi Fanyi has made it clear in https://github.com/Luodian/Otter/blob/6109be305f275e3aa3f1098b4078ab0d7e095aac/mimic-it/convert-it/datasets/fpv.py#L12

Please refer the code to convert your E4D.json. Since it's huge, so local processing may be more convenient.

Maxlinn commented 1 year ago

thanks for quick responding, looking forward to E4D.json if you may!