视频特征的提取支持动态帧数吗，效果相对于8帧会有下降或者变差吗

PKU-YuanGroup / LanguageBind

【ICLR 2024🔥】 Extending Video-Language Pretraining to N-modality by Language-based Semantic Alignment

https://arxiv.org/abs/2310.01852

MIT License

549 stars 44 forks source link

Closed 1093842024 closed 4 months ago

LinB203 commented 4 months ago

Thank you for your attention. The input of the extra 8 frames is not supported at this time. We have not done ablation experiments in this area.