collabora / WhisperSpeech

An Open Source text-to-speech system built by inverting Whisper.
https://collabora.github.io/WhisperSpeech/
MIT License
3.8k stars 207 forks source link

possible alternative vocoder #101

Closed BBC-Esq closed 7 months ago

BBC-Esq commented 7 months ago

Has anyone considered using WaveRNN via Pytorch instead of "vocos"? I'm trying to think of ways to solve the issue that vocos can't be used via MPS on MacOS. Even WaveRNN might still require the operations that aren't supported on MPS so it's a moot issue, but thought I'd ask.

Here is a link to an alternative to vocos that's within Pytorch itself:

https://pytorch.org/audio/main/generated/torchaudio.models.WaveRNN.html#torchaudio.models.WaveRNN

lexkoro commented 7 months ago

Why would anyone want to use wavrnn? Takes forever to train, slow inference time, worse quality compared to vocos or other current GAN based vocoders.

BBC-Esq commented 7 months ago

Why would anyone want to use wavrnn? Takes forever to train, slow inference time, worse quality compared to vocos or other current GAN based vocoders.

Didn't know what, I was just researching alternatives to where we might be able to fully use MPS as a compute device. If this is the case, good to know, we can close out the issue!

jpc commented 7 months ago

I tested this today and I can barely notice Vocos processing despite the fact that it’s running on the cpu. I think we can keep it is as.