DAMO-NLP-SG / VideoLLaMA2

VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs
Apache License 2.0
871 stars 60 forks source link

Transcription as Input #42

Open lucasxu777 opened 4 months ago

lucasxu777 commented 4 months ago

Hi, I wonder if it's possible to take in the transcription of the input video while using the inference code to generate some responses. Thanks.

LiangMeng89 commented 3 days ago

Hi, I wonder if it's possible to take in the transcription of the input video while using the inference code to generate some responses. Thanks.

Hello,I'm a phD student from ZJU, I also use videollama2 to do my own research,we create a WeChat group to discuss some issues of videollama2 and help each other,could you join us? Please contact me: WeChat number == LiangMeng19357260600, phone number == +86 19357260600,e-mail == liangmeng89@zju.edu.cn.