erew123 / alltalk_tts

AllTalk is based on the Coqui TTS engine, similar to the Coqui_tts extension for Text generation webUI, however supports a variety of advanced features, such as a settings page, low VRAM support, DeepSpeed, narrator, model finetuning, custom models, wav file maintenance. It can also be used with 3rd Party software via JSON calls.
GNU Affero General Public License v3.0
864 stars 98 forks source link

Having narrator enabled in dockerconfig.json results in a silent failure in a clean container #309

Closed C0rn3j closed 2 weeks ago

C0rn3j commented 1 month ago

It seems to only download the necessary dependencies when triggered from the Web UI, but if one provides a config where narrator is already enabled (in my case I am updating the deps at the moment and constantly rebuilding), it will silently fail to launch.

There should be some detection in the UI launch to either download the dep outright if narrator is enabled, or not to load it if the dep is missing and it is enabled.

Another option would be to add the en-core-web-md==3.7.1 dependency to requirements directly instead of only downloading it on demand.

erew123 commented 1 month ago

Hi @C0rn3j

You mean download when in the TTS generator? As that is the only bit that uses the en-core-web-md. Though I cannot see any way that setting the narrator enabled would have any impact on this. The narrator_enabled is nothing more than a system variable that applies specifically to the text-generation-webui TG-webui extension/interface https://github.com/oobabooga/text-generation-webui

image

Its not used for start-up and its not used by the TTS generator.

Beyond that, the only narrator enabled code is from the API requests on this line of code https://github.com/erew123/alltalk_tts/blob/main/tts_server.py#L1059 (Im assuming you are talking AllTalk v1) and has no other impacts throughout AllTalk or any impacts/interactions with the TTS generator.

The en-core-web-md (spacy) is only used by the TTS generator for analysis of the generated text vs what Whisper can read back. Its imported in tts_diff https://github.com/erew123/alltalk_tts/blob/main/system/tts_diff/tts_diff.py#L89 when analysis is called.

So I can see no way narrator_enabled would impact start-up bar a damaged JSON file.

As for the en-core-web-md, are you saying this is an issue downloading when using the TTS Generator?

Thanks

C0rn3j commented 1 month ago

I am using v1 (with the changes in the PR I sent), setting "narrator_enabled": true, on a clean Docker container will result in failure to startup properly, silently with no errors.

Setting "narrator_enabled": false, as is default, starting, trying to run Analyze TTS (which will download the models/deps) and then switching it to true and restarting is fine.

You mean download when in the TTS generator? As that is the only bit that uses the en-core-web-md.

Yep, it downloads there, just fine when the UI is already running.

erew123 commented 1 month ago

Unless you are running the environment with Text-generation-webui, there is no benefit/need to have narrator_enabled: true, its literally the flag/setting for this checkbox https://raw.githubusercontent.com/erew123/screenshots/main/textgensettings.jpg "Narrator Enabled" in the TGWUI extension.

Are you running TGWUI in the same docker environment?

The only code that would touch that variable, would be TGWUI pulling in the def ui during its start-up as an extentison:

https://github.com/erew123/alltalk_tts/blob/main/script.py#L855

and then if that gradio code is running in TGWUI, it would update the radio button accordingly:

https://github.com/erew123/alltalk_tts/blob/main/script.py#L956

So I am absolutely baffled why this would have any effect on start-up in any way, as that portion of code wouldnt even be touched, if you arent using AllTalk as a TGWUI extension, but even then it shouldnt matter.

Is there literally nothing shown on screen when that setting is flagged true, that shows any start-up progress? Do you want to try uninstalling Gradio on the docker build and see if that has any impact.

C0rn3j commented 4 weeks ago

Unless you are running the environment with Text-generation-webui, there is no benefit/need to have narrator_enabled: true, its literally the flag/setting for this checkbox https://raw.githubusercontent.com/erew123/screenshots/main/textgensettings.jpg "Narrator Enabled" in the TGWUI extension.

Aha, I had no idea and thought I need it for the API.

Are you running TGWUI in the same docker environment?

Not running it at all.

Is there literally nothing shown on screen when that setting is flagged true, that shows any start-up progress?

When I tried docker exec into the container and starting up uvicorn(or whatever it's named) and then the script.py, the script.py simply ended up output on a line that said "Detected docker env, waiting."

I have peeked at the script.py but did not quickly see how it should have continued.

erew123 commented 2 weeks ago

Hi @C0rn3j I didnt build the V1 docker version. In V1 launch.sh is starting the script https://github.com/erew123/alltalk_tts/blob/main/launch.sh which is where the pause occurs. Why this was set this way by the person whom did it, I dont know.