DAMO-NLP-SG / VideoLLaMA2

VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs
Apache License 2.0
918 stars 60 forks source link

Can I use a WAV file as input for inference? #73

Closed FanBu02 closed 1 month ago

FanBu02 commented 3 months ago

Can I use a WAV file as input for inference? Could you tell me roughly how to modify the code?

xinyifei99 commented 1 month ago

Thanks for your attention! You can switch to the audio_visual branch (https://github.com/DAMO-NLP-SG/VideoLLaMA2/tree/audio_visual) and clone the repository to run inference for audio related tasks.

LiangMeng89 commented 2 weeks ago

Can I use a WAV file as input for inference? Could you tell me roughly how to modify the code?

Hello,I'm a phD student from ZJU, I also use videollama2 to do my own research,we create a WeChat group to discuss some issues of videollama2 and help each other,could you join us? Please contact me: WeChat number == LiangMeng19357260600, phone number == +86 19357260600,e-mail == liangmeng89@zju.edu.cn.