DAMO-NLP-SG / Video-LLaMA

[EMNLP 2023 Demo] Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding
BSD 3-Clause "New" or "Revised" License
2.7k stars 243 forks source link

Audio input #168

Open CHEN-H01 opened 2 months ago

CHEN-H01 commented 2 months ago

Hi, I have a question about audio input.

In "Video-LLaMA/video_llama/conversation/conversation_video.py line 255", I think the input of this function (load_and_transform_audio_data) should be an audio file (.wav), why is your input here a video file?

audio = load_and_transform_audio_data([video_path],"cpu", clips_per_video=8)