DAMO-NLP-SG / VideoLLaMA2

VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs
Apache License 2.0
918 stars 60 forks source link

Webvid-10M (40% sampling) #56

Open VJatla opened 4 months ago

VJatla commented 4 months ago

Hello,

After going through the paper, I understood that 40% of video-text pairs are used from webvid-10M dataset. Can you please provide me the rationale, or, point me in the direction which helps me understand how these 40% of video are picked.

LiangMeng89 commented 2 weeks ago

Hello,I'm a phD student from ZJU, I also use videollama2 to do my own research,we create a WeChat group to discuss some issues of videollama2 and help each other,could you join us? Please contact me: WeChat number == LiangMeng19357260600, phone number == +86 19357260600,e-mail == liangmeng89@zju.edu.cn.

VJatla commented 2 weeks ago

Hello,

Thanks for inviting me. We gave up on this and moved on to not using LLM for now.

Regards Venkatesh Jatla

On Wed, Nov 13, 2024 at 11:53 AM Liang Meng @.***> wrote:

Hello,I'm a phD student from ZJU, I also use videollama2 to do my own research,we create a WeChat group to discuss some issues of videollama2 and help each other,could you join us? Please contact me: WeChat number == LiangMeng19357260600, phone number == +86 19357260600,e-mail == @.***

— Reply to this email directly, view it on GitHub https://github.com/DAMO-NLP-SG/VideoLLaMA2/issues/56#issuecomment-2474471045, or unsubscribe https://github.com/notifications/unsubscribe-auth/ANS77UE55X2ISBNCVWUCPEL2AONZ7AVCNFSM6AAAAABLBP22VCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDINZUGQ3TCMBUGU . You are receiving this because you authored the thread.Message ID: @.***>