Closed fakerybakery closed 10 months ago
I made quite a bit of progress on this. It's not commited yet because the code is still a bit hacky but with torch.compile
I can resynthesize the voice 17x faster than real time for the tiny
S2A model on a 4090. (will be a bit slower for end-to-end)
The HQ S2A (small
) model is 3.4x faster than real-time.
Nice, that's amazing! Thanks so much!
Ok, all the improvements were merged and the end-to-end generation is more than 12x faster than real-time on a consumer 4090.
torch.compile
needs some warmup so the first inference is a lot slower (around 30s) but it should be fast afterwards.
Thanks!
Hi, I'm trying to run WhisperSpeech on a Colab instance - I've used the example Colab provided. However, unfortunately it still seems to take longer than realtime to generate. Is there anything I need to do to enable these improvements? Thanks!
Hey, there are a few ways to speed up inference that we are looking into. Right now I have some smaller PyTorch optimizations that I will commit, we can also take advantage of the tricks from this article: https://pytorch.org/blog/accelerating-generative-ai-2/
There is also a possibility to use whisper/llama.cpp or CTranslate2 (FasterWhisper) implementations to get better performance on both CPUs and GPUs.
That said right now we are looking into increasing the quality and adding support for more languages so if someone wants to look into this then we'd love a pull request. :)
We can also offer commercial support contracts to help people optimize and integrate the model into their products.