DAMO-NLP-SG / VideoLLaMA2

VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs
Apache License 2.0
922 stars 60 forks source link

Release about Audio branch #63

Closed XuecWu closed 1 month ago

XuecWu commented 4 months ago

Thank you for your great contributions!

Could you tell me the release schedule of the audio branch?

Looking forward to your reply!

lixin4ever commented 3 months ago

Thanks for your kind words.

We will open-source the audio-language branch for sure but no guarantee for a concrete release date since we are constantly updating the video-language branch (and the audio part is built on top of a well-trained video-language branch)

XuecWu commented 3 months ago

Thanks for your kind words.

We will open-source the audio-language branch for sure but no guarantee for a concrete release date since we are constantly updating the video-language branch (and the audio part is built on top of a well-trained video-language branch)

Thank you for your timely reply! Looking forward to the release.

xinyifei99 commented 1 month ago

Thanks for your attention! You can switch to the audio_visual branch (https://github.com/DAMO-NLP-SG/VideoLLaMA2/tree/audio_visual) and clone the repository to train and inference the audio_visual branch.