bytedance / Shot2Story

A new multi-shot video understanding benchmark Shot2Story with comprehensive video summaries and detailed shot-level captions.
https://mingfei.info/shot2story
91 stars 6 forks source link

Inaccurate shot detection #10

Closed hkunzhe closed 4 months ago

hkunzhe commented 4 months ago

In some transitions, the current shot will include partial frames from both the previous shot and the next shot, as in MVq8Q9tKdfU.23_79_485.mp4.

youthHan commented 4 months ago

Do you mean this video? image

https://huggingface.co/datasets/mhan/Shot2Story-134K/viewer/multi-shot/43k_human_train?q=MVq8Q9tKdfU&row=1

youthHan commented 4 months ago

It seems correct. It would be appreciated if you find any mistake and we can fix it. Thanks.

hkunzhe commented 4 months ago

@youthHan, Thanks for your reply! I downloaded original videos from #5 . For MVq8Q9tKdfU.23.mp4, I used the ffmpeg to cut MVq8Q9tKdfU.23_79_485.mp4 and MVq8Q9tKdfU.23_486_531.mp4 according to the start frame index and the end frame index.

ffmpeg -i MVq8Q9tKdfU.23.mp4 -vf "select='between(n\,79\,485)',setpts=PTS-STARTPTS" -af "aselect='between(n\,79\,485)',asetpts=PTS-STARTPTS" -vsync cfr MVq8Q9tKdfU.23_79_485.mp4
ffmpeg -i MVq8Q9tKdfU.23.mp4 -vf "select='between(n\,486\,531)',setpts=PTS-STARTPTS" -af "aselect='between(n\,486\,531)',asetpts=PTS-STARTPTS" -vsync cfr MVq8Q9tKdfU.23_486_531.mp4

As shown in two shot videos below, the current shot will include partial frames from both the previous shot and the next shot.

https://github.com/bytedance/Shot2Story/assets/30763967/c1ebfcc3-5117-4fb8-9784-16ee0e89c1d8

https://github.com/bytedance/Shot2Story/assets/30763967/aa5cc8e0-b2a3-4781-943d-4612d8b15469

youthHan commented 4 months ago

This sometimes occurs. The frames are original from the videos, during the shot transitioning.