DAMO-NLP-SG / VideoLLaMA2

VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs
Apache License 2.0
752 stars 50 forks source link

Why the sceneDetect code didn't use at all #23

Closed lucasjinreal closed 2 months ago

lucasjinreal commented 3 months ago

Sampling training data with SceneDetect will cost too much time?

lixin4ever commented 3 months ago

Thanks for your interest.

You are correct, scene detection is already very slow and encoding multiple segmented scenes (instead of one single video) will make the whole process much more time-consuming.

Note that this is just an experimental feature and NOT used in the current release.