OpenGVLab / InternVideo

[ECCV2024] Video Foundation Models & Data for Multimodal Understanding
Apache License 2.0
1.43k stars 88 forks source link

Question about events in the proposed dataset #214

Open lzc2017 opened 20 hours ago

lzc2017 commented 20 hours ago

Thank you for your meaningful work. I would like to ask whether the filted segments can represent the independent events in the video. In other words, if the original video includes five filtered segments, can we say that the main events in the original video are these five?

yinanhe commented 19 hours ago

From my personal point of view, I don't think so. InternVid has been segmented by shots, static screening, caption generation, and filtering. The segments may be segments with inaccurate caption descriptions or low motion scores, which cannot accurately express the event information in the video. The segmentation of shots may separate different shots of the same event, which does not mean that they are two independent events.