The holy grail is real time voice synthesis. Thats what you need for a real TTS API.
Personally, in my case, I dont even care about cloning, just tortoise, because of how it works, even creates the best most human like fake synthetic voices out there.
Will this project at some point be 50x faster or whatever we would need for real time at least 10x at minimum,
or is this approach fundamentally a little too slow and someone will just pick up the learnings from this and make a ways faster approach?
The only other player with results this good is elevenlabs, so I assume how they work is that they use this codebase, sped it the heck up and have a lot of hardware power for it.
The holy grail is real time voice synthesis. Thats what you need for a real TTS API. Personally, in my case, I dont even care about cloning, just tortoise, because of how it works, even creates the best most human like fake synthetic voices out there. Will this project at some point be 50x faster or whatever we would need for real time at least 10x at minimum, or is this approach fundamentally a little too slow and someone will just pick up the learnings from this and make a ways faster approach?
The only other player with results this good is elevenlabs, so I assume how they work is that they use this codebase, sped it the heck up and have a lot of hardware power for it.