MichiganCOG / Video-Grounding-from-Text

Source code for "Weakly-Supervised Video Object Grounding from Text by Loss Weighting and Object Interaction"
Other
44 stars 9 forks source link

where can I get '--image_root', default='./data/yc2/video_segments_25fps'? #4

Closed Tomwmg closed 5 years ago

Tomwmg commented 5 years ago

I am so sorry to trouble you again, when I sampled video with 1 fps, I found some images can not fit the box annotation well. I try several methods, but no one can fit the box annotation perfectly. If convenience, could you provide me the code for sampling video to frame, or the file of 'video_segments_25fps' used in your paper? Thank you for you help.

natlouis commented 5 years ago

Older versions of ffmpeg, use different methods for seeking the starting frame. We used ffmpeg version 2.8.15 for Ubuntu.

And the exact code we used to sample the frames is:

start_t: segment start time (from JSON annotation file) vid_path: source video seg_dur: segment duration (calculate from JSON annotation file) fps: 25 segment_path: save location

os.system(' '.join(('ffmpeg', '-ss', str(start_t), '-i', vid_path, '-t', str(seg_dur), '-vf','"fps='+str(fps),', scale=720:-1"',os.path.join(segment_path,'%02d.jpg'))))

[EDIT]: The frames you are referencing are sampled at 25 fps. The box annotations are only supplied at 1 fps, but vis.py takes care to only ground the annotated frames.