Open hubenjm opened 6 months ago
Yes, exactly! You can use the annotation file and ffempg to clip the video into smaller clips.
Does VILA randomly sample from frames and send to vit?
Does they using directly 631 frames to training?
Hi we uniformly sample 8 frames for each video clip.
@XueFuzhao is it evenly resampling for 8 out of 631 in above examples? How does the multiple images send into s2-siglip? thanks for the indications.
The youcook2 data repository (http://youcook2.eecs.umich.edu/download) only provides a script to download the raw videos into a folder
.../youcook2/raw_videos/
. However, the entries in theyoucook_filtered_v3.json
file has entries likeand in
data_mixtures.py
, the definition of the youcook2 mixture has videos files referenced from the directoryvideo_data_clipped
.Could you provide details on how you generated the clipped videos or provide the script used to do it? I'm guessing it was done by reading the
youcookii_annotations_trainval.json
file and using ffmpeg to split each raw video into the corresponding clip, but any confirmation/details would be helpful.