facebookresearch / r3m

Pre-training Reusable Representations for Robotic Manipulation Using Diverse Human Video Data
https://sites.google.com/view/robot-r3m/
MIT License
292 stars 45 forks source link

Questions about clip preprocessing #26

Open BetterZH opened 1 year ago

BetterZH commented 1 year ago

The ' narrations.json' file published by EGO4D does not provide the start time and end time of each clip, only the timestamp_frame:

image

First, for each narration_pass, I sort all clips by 'timestamp_frame'. Then,for the current clip, the 'timestamp_frame 'is the start frame, and the 'timestamp_frame' of the next clip is the end frame. However, there are clips longer than 5 minutes in the preprocessed data. This is different from the explanation "each clip was about 200 frames, so about 10 seconds" mentioned in #13.

Is this clip segmentation appropriate? When filtering too long and too short data, what is the maximum and minimum number of frames you set?

rbler1234 commented 11 months ago

Have you got the answer now? I'm now faced with the same problem