Video frame + transcript extraction

emcf / thepipe

Extract markdown and images from URLs, PDFs, docs, slides, and more, ready for multimodal LLMs. ⚡

https://thepi.pe

MIT License

814 stars 61 forks source link

Closed emcf closed 2 months ago

emcf commented 2 months ago

Looking to support extraction of mp4, mov, webm, avi files as well as youtube for a Vision-Language model (not a video model)

Video and audio is not standard in commercial multimodal models today. Because of this, I am looking to transcribe audio.