collabora / WhisperSpeech

An Open Source text-to-speech system built by inverting Whisper.
https://collabora.github.io/WhisperSpeech/
MIT License
4.01k stars 218 forks source link

Docs: missing informations about hardware requirements #158

Open thiswillbeyourgithub opened 1 month ago

thiswillbeyourgithub commented 1 month ago

Hi,

I've been quite interested in whisperspeech for a while (Btw I hope this project is still alive!) but think the documentation is a bit lacking / scattered around / outdated so I decided to ask you questions instead:

  1. What are the VRAM requirements to run it? What are the expected performance on consumer hardware?
  2. What are the quantizations available? What's the tradeoff we should expect performance wise?

Reading the site and the readme I'm still not sure this is slow or not :)

Thanks a lot!

freedomtowin commented 1 week ago

The zero-shot voice cloning is very interesting to me as well. However, I believe that since "speechbrain" is being used in the backend that perhaps "finetuning" isn't something that can be easily done.