DAMO-NLP-SG / VideoLLaMA2

VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs
Apache License 2.0
826 stars 56 forks source link

train and fine tune for audio-video #79

Closed trahman8 closed 3 days ago

trahman8 commented 2 months ago

If you upload details on how to train and fine-tune audio-video that would be great.

lixin4ever commented 2 months ago

We are working on this, please stay tuned.

xinyifei99 commented 5 days ago

Thanks for your attention! You can switch to the audio_visual branch (https://github.com/DAMO-NLP-SG/VideoLLaMA2/tree/audio_visual) and clone the repository to train and inference the audio_visual branch.