danielmiessler / fabric

fabric is an open-source framework for augmenting humans using AI. It provides a modular framework for solving specific problems using a crowdsourced set of AI prompts that can be used anywhere.
https://danielmiessler.com/p/fabric-origin-story
MIT License
23.44k stars 2.48k forks source link

[Feature request]: Whisper locally for video transcription #441

Open RodriMora opened 4 months ago

RodriMora commented 4 months ago

What do you need?

Hi!

I've been using fabric to save time in watching youtube videos and it's been great!

I've look at the code and it seems to pull the auto-generated youtube transcript using the API. Also the "ts" function uses the Whisper API to transcribe audio files.

As a mainly local user, I use textgen webui or TabbyAPI or llama.cpp for inference. As all of them provide an OpenAI compatible API, I just change the OPENAI_BASE_URL variable to the IP of my llm server.

But I believe for the Whisper transcription is not possible to specify a local server and it only uses the Whisper API. There are some great tools like Faster Whipser or WhisperX that are way faster than the original whisper and also provide better quality (some benchmarks here)

It would be really cool to be able to use Whisper locally and not depend on the Youtube transcript or Whisper's API.

Something like the finalboss version would be a parameter to specify to don't use the YT's transcript and download the video, convert it to .wav, run whisper locally (or connected to a remote but local whisper) and then transcribe it. It also has the benefit of translating it if needed.

And the same pipeline would work for local .mp4 videos for example.

Thanks a lot for your work!

rlwilson17 commented 4 months ago

Wanted to add support as I also came looking for this feature! Implementing local whisper transcription in fabric directly does seem like it would lower the technical barrier of entry for those more privacy minded among us.

Joshfindit commented 3 months ago

I'd even suggest going further:

Have some type of clear separation between local (or first-party) and remote (or 3rd party) calls.

Whether that's an environment flag and some prompts ("You asked us to check with you when calling out to a 3rd party. Are you sure you want to run this?") or local versions of each command (echo "An idea that coding is like speaking with rules." | write_essay_private / | write_essay --private_only) or maybe stretching the --model functionality so that we could configure our own local models then use them (-m whisper_98 for a whisper model running at 192.168.0.98 for example).

It's probably important to think about making any solution able to handle multiple local servers, even multiple local servers of the same type as this removes a future barrier (allows users to experiment with different configs and compare results for example)

Also relates to #276

cleverestx commented 2 months ago

Yes, I'd also like it to support Whisper against local files for TRANSLATION. As you said, " It also has the benefit of translating it if needed."...That would be SWEET.

barshy commented 1 month ago

Being able to transcribe videos locally would be great

thanks