Open lzc2017 opened 20 hours ago
From my personal point of view, I don't think so. InternVid has been segmented by shots, static screening, caption generation, and filtering. The segments may be segments with inaccurate caption descriptions or low motion scores, which cannot accurately express the event information in the video. The segmentation of shots may separate different shots of the same event, which does not mean that they are two independent events.
Thank you for your meaningful work. I would like to ask whether the filted segments can represent the independent events in the video. In other words, if the original video includes five filtered segments, can we say that the main events in the original video are these five?