erew123 / alltalk_tts

AllTalk is based on the Coqui TTS engine, similar to the Coqui_tts extension for Text generation webUI, however supports a variety of advanced features, such as a settings page, low VRAM support, DeepSpeed, narrator, model finetuning, custom models, wav file maintenance. It can also be used with 3rd Party software via JSON calls.
GNU Affero General Public License v3.0
1.05k stars 113 forks source link

Espeak NG problems and question #381

Closed ShaunCassidyPoster closed 2 weeks ago

ShaunCassidyPoster commented 2 weeks ago

Describe the bug Alltalkbeta Standalone: Unable to install Espeak NG on Windows 10.

To Reproduce Configuration difficulties described in narrative.

Screenshots If applicable, add screenshots to help explain your problem.

Text/logs If applicable, copy/paste in your logs here from the console.

Desktop (please complete the following information): AllTalk was updated: 10/15/2024 Custom Python environment: no Text-generation-webUI was updated: 10/15/2024

Additional context Hi! Text-generation-webui update v1.15 damaged my deepspeed/alltalk_tts extension, which I was unable to fix or revert, either not having matching pydantic, an appropriate deepspeed wheel, or something else I'm too dull to discern.

So I decided to reinstall my webui environment and try the alltalkbeta standalone instead, intending to connect with either webui or SillyTavern--anything that would work.

After installing standalone, I see that "Espeak NG" is a requirement. I tried to install that, but the binary installers for both x64 and x32 consistently hung. As I futzed with these I noticed their download directory became undeletable until I took ownership. Trying to install it in Sandboxie yielded an lsass.exe exception (originating from the local system) in which Windows alerted that it would shut down and restart in one minute. On downloading the source archive, Windows Defender reported "Backdoor:PHP/Dirtelti.HA; Alert Level: Severe...".

This detection is discussed in a Espeak NG issue from February, but perhaps the Windows binaries and/or source are yet to be changed.

Conversely, if it is a false positive, then perhaps the binaries will not install on older hardware without AVX/2 instructions (i7 930 CPU). Is there a manual way to install Espeak NG on Windows? I didn't find any so far.

Yet I am able to directly generate using my custom voice checkpoint in the standalone alltalk. Is it possible to connect to it without using Espeak NG? Is Espeak NG it something I can forgo if I just want to connect via API to a single voice on a single XTTSv2 checkpoint (trained previously with the extension)? I'm not exactly sure what it does or if I can skip it.

Thanks!

Edit: Hmm, Espeak seems already to be installed as part of Calibre2. In Calibre2, I activated "read aloud" and it downloaded the "en_US-libritts-high" voice, onnx and json. It read aloud and seems to be functional. Would this installation suffice for AllTalk standalone?

It has *_dict data files in an "espeak-ng-data" directory, as well as espeak-ng.dll, libtashkeel_model.ort, onnxruntime.dll, onnxruntime_providers_shared.dll, piper.exe, and piper_phonemize.dll.

Edit 2: I plucked the log file of the failed install from the temp files, and it shows MSI installation error 1603... One cause of that might be "Windows Installer is attempting to install an app that is already installed on your PC." That seems likely.

I may be able to reinstall Calibre2 without Espeak/Piper, so I will try that and close this issue.

erew123 commented 2 weeks ago

@ShaunCassidyPoster Thanks for the heads up. I had not seen/come across the false positive virus alert warning at all. Not a single installation I've performed on Win 11 have ever flagged it (even brand new, factory fresh installs of Windows).

I can say for certain that the Coqui TTS engines use espeak-ng as their phonemizer, as I helped update the detection code in the Coqui TTS engine a couple of months back https://github.com/idiap/coqui-ai-TTS/pull/32 and I am pretty sure Piper also uses/requires it on Windows, along with other TTS engines I intend to add.

The long and short answer to your "would Calibre2 work" is, I don't know at the moment as the scripts that are doing the checks are from the actual TTS engine developers and not my code. Maybe at some point I will have to try catch the espeak-ng Dev and ask if they will provide a new MSI installer, or maybe look at building it myself https://github.com/espeak-ng/espeak-ng/blob/master/docs/building.md

Thanks

ShaunCassidyPoster commented 2 weeks ago

Well, I solved the problem by not doing anything further (left Calibre alone) except installing the AllTalk 2 SillyTavern extension. My fine-tuned XTTSv2 model works great with the standalone API. I think it takes less VRAM, too! The only weird thing is AllTalk 2 still doesn't know that Espeak-NG may be installed and shows the red error message, or maybe it's a softer requirement or it works as long as certain files are present. Thanks a lot!

eginhard commented 1 week ago

I can say for certain that the Coqui TTS engines use espeak-ng as their phonemizer

Some Coqui models use Espeak, but not all. In particular, XTTS does not use phonemes at all and it should be possible to run it without Espeak.

erew123 commented 1 week ago

@eginhard Thanks, that's very handy to know! :)