NJU-PCALab / OpenVid-1M

154 stars 2 forks source link

Video pre-processing pipeline? #1

Open hkunzhe opened 1 month ago

hkunzhe commented 1 month ago

Hi, great work! I can't find a specific data collection and pre-processing pipeline. Could you elaborate on video sources and text annotations?

nankepan commented 1 month ago

Thank you for your attention. Please refer to the data collection and preprocessing procedures in our paper.

hkunzhe commented 1 month ago

Hi @nankepan, Could you mind sharing the data-processing scripts?

hkunzhe commented 1 month ago

@nankepan In the third section of the paper, why is computing the semantic similarity between adjacent frames to differentiate between static and flicker videos, instead of directly compute the motion score?

Consider the following two scenarios: 1) A person speaking to the camera with slight bodily movements, and 2) camera movements such as dolly or pan shots. In the former case, since the background remains stationary, the similarity of CLIP features should also be relatively high. Conversely, in the latter scenario, due to the camera's motion, the semantic similarity between sampled adjacent frames might actually be quite lower. However, neither of these belong to the static or flicker videos.

Could you provide a few examples to illustrate the necessity of this operation?