I've been quite interested in whisperspeech for a while (Btw I hope this project is still alive!) but think the documentation is a bit lacking / scattered around / outdated so I decided to ask you questions instead:
What are the VRAM requirements to run it? What are the expected performance on consumer hardware?
What are the quantizations available? What's the tradeoff we should expect performance wise?
Reading the site and the readme I'm still not sure this is slow or not :)
The zero-shot voice cloning is very interesting to me as well. However, I believe that since "speechbrain" is being used in the backend that perhaps "finetuning" isn't something that can be easily done.
Hi,
I've been quite interested in whisperspeech for a while (Btw I hope this project is still alive!) but think the documentation is a bit lacking / scattered around / outdated so I decided to ask you questions instead:
Reading the site and the readme I'm still not sure this is slow or not :)
Thanks a lot!