Closed qixueweigitbub closed 1 week ago
Sorry for the confusion. The currently available models do not include audio branch and therefore they will not take audio as input.
Thanks for your attention! You can switch to the audio_visual branch (https://github.com/DAMO-NLP-SG/VideoLLaMA2/tree/audio_visual) and clone the repository to run inference for audio related tasks.
Thanks for the great work!
I tested different videos with sound (speech, music, noise etc) using the online demo, but the model keeps ignore the sound information no matter how I asked in the prompt explicitly or implicitly. Is it because the audio file of the video is not loaded properly? Can you help provide any hints here? Anyone else has the same issue?