Closed trahman8 closed 3 days ago
We are working on this, please stay tuned.
Thanks for your attention! You can switch to the audio_visual branch (https://github.com/DAMO-NLP-SG/VideoLLaMA2/tree/audio_visual) and clone the repository to train and inference the audio_visual branch.
If you upload details on how to train and fine-tune audio-video that would be great.