Open danielw97 opened 7 months ago
Ah interesting, I had been watching StyleTTS2 progress a while back but I haven't looked at it in the last month or so. I'll check it out and try to play with it some, that would be neat if it's even better than XTTSv2!
Great, I'd be interested in how you get on with that. The main thing that sticks out to me is not only the naturalness, although it is quite fast on the cpu that I tested it with as well.
I'm keeping a close eye on the styletts2 project, and just wanted to pass along that there's been a pip package released strictly for inference. Of course as this is new things change quickly, although wanted to let you know. https://github.com/sidharthrajaram/StyleTTS2
Thanks! I played with it yesterday and it’s impressive. Still a little noisy with some stuff. Looking forward to when fine tuning is easy too. Definitely one to watch closely though.
On Fri, Dec 15, 2023 at 5:32 PM danielw97 @.***> wrote:
I'm keeping a close eye on the styletts2 project, and just wanted to pass along that there's been a pip package released strictly for inference. Of course as this is new things change quickly, although wanted to let you know. https://github.com/sidharthrajaram/StyleTTS2
— Reply to this email directly, view it on GitHub https://github.com/aedocw/epub2tts/issues/105#issuecomment-1858667554, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAFBJGMQC4I4UQPWUYMMWWLYJT23PAVCNFSM6AAAAABAQEEP5OVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNJYGY3DONJVGQ . You are receiving this because you commented.Message ID: @.***>
I just finished examining StyleTTS2. I think if we accumulate a bit more we might be able to solve the issue with GPL phonemizer dependency, or at least feel bad together. https://github.com/rsxdalv/tts-generation-webui/discussions/212 https://github.com/yl4579/StyleTTS2/pull/91
I get it, I had not noticed the phonemizer/GPL issue when I poked around. Now though the GPL fork makes a whole lot more sense to me!
Honestly I think if/when when StyleTTS2 is sounding good and worth using, we can just use the neuralvox fork and re-license this to GPL, assuming the few contributors we've had will agree to that. If they won't, I can pull out any directly contributed code like that and write it fresh, and make a GPL fork of epub2tts.
You may very well be aware of this already, although there is a rather recent project called StyleTTS2 which raises the bar even further for open-source and local tts generation. No pressure of course, although it would be great to have this integrated at some point in the future. I've tested the demo on a cpu and it runs fairly quickly. As of now there's an http api and also python integration at this repo. https://github.com/NeuralVox/StyleTTS2