DAMO-NLP-SG / VideoLLaMA2

VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs
Apache License 2.0
907 stars 60 forks source link

⭐ [Feat] Supporting audio and audio-visual stages. #98

Closed xinyifei99 closed 1 month ago

xinyifei99 commented 2 months ago

🚀[Add] 1. Add two-stage audio-only training and evaluation code 🚀[Add] 2. Add audio-visual joint training and evaluation code 🚀[Add] 3. Add audio and video processing related installation packages

Perevalov commented 1 month ago

Hi @xinyifei99 , could you please share the following binaries: DAMO-NLP-SG/VideoLLaMA2-7B-Base-audio/audio_tower.bin and DAMO-NLP-SG/VideoLLaMA2-7B-Base-audio/mm_projector_a.bin?