boheumd / MA-LMM

(2024CVPR) MA-LMM: Memory-Augmented Large Multimodal Model for Long-Term Video Understanding
https://boheumd.github.io/MA-LMM/
MIT License
221 stars 26 forks source link

Do you have an idea why different runs result in different accuracy for LVU? #13

Closed joslefaure closed 3 months ago

joslefaure commented 4 months ago

Hi authors. Thanks again for open-sourcing your work. Do you have an idea or some hypothesis as to why different runs might result in drastically different accuracy (+-10) for the LVU dataset (relationship task)? I have been grappling with this issue, and would really appreciate your input. Thanks

boheumd commented 4 months ago

Hi. The limited number of training videos (136) for the relationship task may result in significant variations between different runs. This could potentially impact the model's performance and generalization capabilities.

hulianyuyy commented 2 months ago

May i ask how to get the LVU dataset? The offcial procedure seems to download the videos from youtube through youtube-dl. However, some videos are not be available. It there any way to get the full videos?