DAMO-NLP-SG / Video-LLaMA

[EMNLP 2023 Demo] Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding
BSD 3-Clause "New" or "Revised" License
2.77k stars 255 forks source link

Fix error on loading audio of the input video, as described in issue #163. #164

Open xjr01 opened 5 months ago

xjr01 commented 5 months ago

This is fixed by extracting the audio from the input video and save to a wav file by the ffmpeg-python package.

Fixes #163