Closed emcf closed 2 months ago
Looking to support extraction of mp4, mov, webm, avi files as well as youtube for a Vision-Language model (not a video model)
mp4
mov
webm
avi
youtube
Video and audio is not standard in commercial multimodal models today. Because of this, I am looking to transcribe audio.
Looking to support extraction of
mp4
,mov
,webm
,avi
files as well asyoutube
for a Vision-Language model (not a video model)Video and audio is not standard in commercial multimodal models today. Because of this, I am looking to transcribe audio.