emcf / thepipe

Extract markdown and images from URLs, PDFs, docs, slides, and more, ready for multimodal LLMs. ⚡
https://thepi.pe
MIT License
814 stars 61 forks source link

Swap Whisper Version #13

Open skyler14 opened 2 months ago

skyler14 commented 2 months ago

I was looking at your pipeline and thought you might be better served by using https://github.com/Vaibhavs10/insanely-fast-whisper or allow a bit of wiggle room in your framework to allow an optional parameter for feeding in a seperate processor for video transcription problems. This is over an order of magnitude improvement on vanilla whisper and has cpu/gpu modes. You may want to just allow a whole pipeline to be fed to futureproof this particular endpoint to new tooling