aedocw / epub2tts

Turn an epub or text file into an audiobook
Apache License 2.0
445 stars 44 forks source link

enhancement: add StyleTTS2 support #105

Open danielw97 opened 7 months ago

danielw97 commented 7 months ago

You may very well be aware of this already, although there is a rather recent project called StyleTTS2 which raises the bar even further for open-source and local tts generation. No pressure of course, although it would be great to have this integrated at some point in the future. I've tested the demo on a cpu and it runs fairly quickly. As of now there's an http api and also python integration at this repo. https://github.com/NeuralVox/StyleTTS2

aedocw commented 7 months ago

Ah interesting, I had been watching StyleTTS2 progress a while back but I haven't looked at it in the last month or so. I'll check it out and try to play with it some, that would be neat if it's even better than XTTSv2!

danielw97 commented 7 months ago

Great, I'd be interested in how you get on with that. The main thing that sticks out to me is not only the naturalness, although it is quite fast on the cpu that I tested it with as well.

danielw97 commented 7 months ago

I'm keeping a close eye on the styletts2 project, and just wanted to pass along that there's been a pip package released strictly for inference. Of course as this is new things change quickly, although wanted to let you know. https://github.com/sidharthrajaram/StyleTTS2

aedocw commented 7 months ago

Thanks! I played with it yesterday and it’s impressive. Still a little noisy with some stuff. Looking forward to when fine tuning is easy too. Definitely one to watch closely though.

On Fri, Dec 15, 2023 at 5:32 PM danielw97 @.***> wrote:

I'm keeping a close eye on the styletts2 project, and just wanted to pass along that there's been a pip package released strictly for inference. Of course as this is new things change quickly, although wanted to let you know. https://github.com/sidharthrajaram/StyleTTS2

— Reply to this email directly, view it on GitHub https://github.com/aedocw/epub2tts/issues/105#issuecomment-1858667554, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAFBJGMQC4I4UQPWUYMMWWLYJT23PAVCNFSM6AAAAABAQEEP5OVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNJYGY3DONJVGQ . You are receiving this because you commented.Message ID: @.***>

rsxdalv commented 6 months ago

I just finished examining StyleTTS2. I think if we accumulate a bit more we might be able to solve the issue with GPL phonemizer dependency, or at least feel bad together. https://github.com/rsxdalv/tts-generation-webui/discussions/212 https://github.com/yl4579/StyleTTS2/pull/91

aedocw commented 6 months ago

I get it, I had not noticed the phonemizer/GPL issue when I poked around. Now though the GPL fork makes a whole lot more sense to me!

Honestly I think if/when when StyleTTS2 is sounding good and worth using, we can just use the neuralvox fork and re-license this to GPL, assuming the few contributors we've had will agree to that. If they won't, I can pull out any directly contributed code like that and write it fresh, and make a GPL fork of epub2tts.

aedocw commented 6 months ago

Discussion for license change