balewgize / skimmit

Article and YouTube video summary from URL
https://balewgize.app/
MIT License
11 stars 5 forks source link

Use Whisper or Gemini to transcribe YouTube video #5

Open AndreaOrru opened 7 months ago

AndreaOrru commented 7 months ago

Cool project!

In my experiments, I found that Whisper (OpenAI speech-to-text model) produces vastly better results when transcribing videos, compared to the default YouTube subtitles. Gemini is supposed to be even better, although I haven't tried.

Would it be possible to support transcribing using a model, in addition to the default YouTube subtitles?

balewgize commented 7 months ago

Hi @AndreaOrru, thanks for your feedback!

You're right: adding support for transcribing using a model, like Whisper, would produce better results. It's something I've been considering, and your feedback reinforces its importance.

FYI, Gemini support is already added. You can login with guest credentials and select the model used for summary (GPT-3.5 or Gemini Pro)

I'll prioritize working on this feature soon and welcome any further thoughts you might have.

Thanks again for the awesome suggestion!

balewgize commented 7 months ago

And of course, feel free to open a pull request if you'd like to be more involved.

AndreaOrru commented 7 months ago

Oh, I meant using Gemini multimodal capabilities to do the transcription. :)

balewgize commented 7 months ago

Ah got it :) Added to my TODO.