erew123 / alltalk_tts

AllTalk is based on the Coqui TTS engine, similar to the Coqui_tts extension for Text generation webUI, however supports a variety of advanced features, such as a settings page, low VRAM support, DeepSpeed, narrator, model finetuning, custom models, wav file maintenance. It can also be used with 3rd Party software via JSON calls.
GNU Affero General Public License v3.0
864 stars 98 forks source link

Issue with Czech sound generation #274

Closed Kuba-Trutnov closed 1 month ago

Kuba-Trutnov commented 1 month ago

Hello everyone,

Today I encountered a problem while creating an audio file in Czech language.

When I tested on the main page using Demo/Test TTS, everything worked perfectly, but the problem occurred when I went to http://127.0.0.1:7851/static/tts_generator/tts_generator.html

Even though I set the language on the main page to Czech, I encountered the problem that the output text in the console did not contain the special Czech characters above the letters, namely the hooks. Therefore, the generated audio was also incorrect. (č ď ě ň ó ř š ť ž Č Ď Ě Ň Ř Š Ť Ž)

The interesting thing was that the special Czech comma characters above the letters worked. (á é í ó ú ů ý Á É Í Ó Ú Ů Ý)

So I started researching in the html code "\alltalk_tts\system\tts_generator\tts_generator.html" and found the part responsible for this problem:

cleanedText = cleanedText.replace(/[^a-zA-Z0-9\s., ;:!? -'"$À-ÿ\u0400-\u04FF\u0150\u0151\u0170\u0171\u0900-\u097F\u2018\u2019\u201C\u201D\u2026\u3001\u3002\u3040-\u309F\u30A0-\u30FF\u4E00-\u9FFF\u3400- \u4DBF\uF900-\uFAFF\u0600-\u06FF\u0750-\u077F\uFB50-\uFDFF\uFE70-\uFEFF\uAC00-\uD7A3\u1100-\u11FF\u3130-\u318F\uFF01\uFF0c\uFF1A\uFF1B\uFF1F]/g, '');

And I replaced the original part with this part:

cleanedText = cleanedText.replace(/[^a-zA-Z0-9\s.,;:!?-'"$À-ÿáčďéěíňóřšťúůýžÁČĎÉĚÍŇÓŘŠŤÚŮÝŽ\u0400-\u04FF\u0150\u0151\u0170\u0171\u0900-\u097F\u2018\u2019\u201C\u201D\u2026\u3001\u3002\u3040-\u309F\u30A0-\u30FF\u4E00-\u9FFF\u3400-\u4DBF\uF900-\uFAFF\u0600-\u06FF\u0750-\u077F\uFB50-\uFDFF\uFE70-\uFEFF\uAC00-\uD7A3\u1100-\u11FF\u3130-\u318F\uFF01\uFF0c\uFF1A\uFF1B\uFF1F]/g, '');

This solved a script error with sending text to the console and missing characters.

To be on the safe side, I also modified the start_alltalk.bat file by adding the line "chcp 65001":

@echo off chcp 65001 cd /D "C:\alltalk_tts\" set CONDA_ROOT_PREFIX=C:\alltalk_tts\alltalk_environment\conda set INSTALL_ENV_DIR=C:\alltalk_tts\alltalk_environment\env run "C:\alltalk_tts\alltalk_environment\conda\condabin\conda.bat" enable "C:\alltalk_tts\alltalk_environment\env run python script.py

Yours sincerely Kuba

Edited HTML: tts_generator.zip

erew123 commented 1 month ago

Hi @Kuba-Trutnov

Thanks for that. Ive extended the character set with \u00C0-\u017F\ which should cover all the above additions for you.

Ive made sure thats posted throughout other areas of AllTalk as necessary.

Thanks