erew123 / alltalk_tts

AllTalk is based on the Coqui TTS engine, similar to the Coqui_tts extension for Text generation webUI, however supports a variety of advanced features, such as a settings page, low VRAM support, DeepSpeed, narrator, model finetuning, custom models, wav file maintenance. It can also be used with 3rd Party software via JSON calls.
GNU Affero General Public License v3.0
1.06k stars 113 forks source link

Not able to generate Hindi audio using xtts - xttsv2_2.0.3 #348

Closed nitinmukesh closed 1 month ago

nitinmukesh commented 1 month ago

Describe the bug Trying to generate Hindi audio but it is coming up with error. English audio is successfully generated.

To Reproduce Launch the UI and enter Hindi text in Text Input. Select xtts - xttsv2_2.0.3. Under Advanced settings > Select Language as Hi. Click Generate TTS.

Screenshots N.A.

Text/logs

(C:\tut\alltalk_tts\alltalk_environment\env) C:\tut\alltalk_tts>start_alltalk.bat
[AllTalk TTS]     _    _ _ _____     _ _       _____ _____ ____
[AllTalk TTS]    / \  | | |_   _|_ _| | | __  |_   _|_   _/ ___|
[AllTalk TTS]   / _ \ | | | | |/ _` | | |/ /    | |   | | \___ \
[AllTalk TTS]  / ___ \| | | | | (_| | |   <     | |   | |  ___) |
[AllTalk TTS] /_/   \_\_|_| |_|\__,_|_|_|\_\    |_|   |_| |____/
[AllTalk TTS]
[AllTalk TTS] Config file update: No Updates required
[AllTalk TTS] Start-up Mode     : Standalone mode
[AllTalk TTS] WAV file deletion : Disabled
[AllTalk TTS] Github updated    : 15th August 2024 at 08:27
[AllTalk ENG] Transcoding       : ffmpeg found
[AllTalk ENG] DeepSpeed version : 0.14.0+ce78a63
[AllTalk ENG] Python Version    : 3.11.0
[AllTalk ENG] PyTorch Version   : 2.2.1+cu121
[AllTalk ENG] CUDA Version      : 12.1
[AllTalk ENG]
[AllTalk ENG] Model/Engine : xttsv2_2.0.3 loading into cuda
[AllTalk ENG] Model License: https://coqui.ai/cpml.txt
[AllTalk ENG] Load time : 32.82 seconds.
[AllTalk TTS]
[AllTalk TTS] API Address : 127.0.0.1:7851
[AllTalk TTS] Gradio Light: http://127.0.0.1:7852
[AllTalk TTS] Gradio Dark : http://127.0.0.1:7852?__theme=dark
[AllTalk TTS]
[AllTalk TTS] Please use Ctrl+C when exiting AllTalk otherwise a
[AllTalk TTS] subprocess may continue running in the background.
[AllTalk TTS]
[AllTalk TTS] AllTalk Server Ready
[AllTalk GEN] नमस्ते! आज आप कैसे हैं?
[GEN] Error during audio generation: 'hi'
[AllTalk GEN] नमस्ते! आज आप कैसे हैं?
[GEN] Error during audio generation: 'hi'
[AllTalk GEN] नमस्ते! आज आप कैसे हैं?
[GEN] Error during audio generation: 'hi'
[AllTalk GEN] नमस्ते! आज आप कैसे हैं?
[GEN] Error during audio generation: 'hi'
[AllTalk GEN] नमस्ते! आज आप कैसे हैं?
[GEN] Error during audio generation: 'hi'
[AllTalk GEN] नमस्ते! आज आप कैसे हैं?
[GEN] Error during audio generation: 'hi'

Desktop (please complete the following information): AllTalk was updated: [approx. date]: 15th August 2024 at 08:27 Custom Python environment: [yes/no give details if yes] No Text-generation-webUI was updated: [approx. date] It's beta version and upto date

Additional context

image

image

nitinmukesh commented 1 month ago

Tried French and English, both working. Hindi Not working. Please help.

(C:\tut\alltalk_tts\alltalk_environment\env) C:\tut\alltalk_tts>start_alltalk.bat
[AllTalk TTS]     _    _ _ _____     _ _       _____ _____ ____
[AllTalk TTS]    / \  | | |_   _|_ _| | | __  |_   _|_   _/ ___|
[AllTalk TTS]   / _ \ | | | | |/ _` | | |/ /    | |   | | \___ \
[AllTalk TTS]  / ___ \| | | | | (_| | |   <     | |   | |  ___) |
[AllTalk TTS] /_/   \_\_|_| |_|\__,_|_|_|\_\    |_|   |_| |____/
[AllTalk TTS]
[AllTalk TTS] Config file update: No Updates required
[AllTalk TTS] Start-up Mode     : Standalone mode
[AllTalk TTS] WAV file deletion : Disabled
[AllTalk TTS] Github updated    : 15th August 2024 at 08:27
[AllTalk ENG] Transcoding       : ffmpeg found
[AllTalk ENG] DeepSpeed version : 0.14.0+ce78a63
[AllTalk ENG] Python Version    : 3.11.0
[AllTalk ENG] PyTorch Version   : 2.2.1+cu121
[AllTalk ENG] CUDA Version      : 12.1
[AllTalk ENG]
[AllTalk ENG] Model/Engine : xttsv2_2.0.3 loading into cuda
[AllTalk ENG] Model License: https://coqui.ai/cpml.txt
[AllTalk ENG] Load time : 39.11 seconds.
[AllTalk TTS]
[AllTalk TTS] API Address : 127.0.0.1:7851
[AllTalk TTS] Gradio Light: http://127.0.0.1:7852
[AllTalk TTS] Gradio Dark : http://127.0.0.1:7852?__theme=dark
[AllTalk TTS]
[AllTalk TTS] Please use Ctrl+C when exiting AllTalk otherwise a
[AllTalk TTS] subprocess may continue running in the background.
[AllTalk TTS]
[AllTalk TTS] AllTalk Server Ready
[AllTalk GEN] Bonjour! comment vas-tu aujourd'hui?
C:\tut\alltalk_tts\alltalk_environment\env\Lib\site-packages\transformers\models\gpt2\modeling_gpt2.py:544: UserWarning: 1Torch was not compiled with flash attention. (Triggered internally at ..\aten\src\ATen\native\transformers\cuda\sdp_utils.cpp:263.)
  attn_output = torch.nn.functional.scaled_dot_product_attention(
[AllTalk GEN] TTS Generate: 5.50 seconds. LowVRAM: False DeepSpeed: False
nitinmukesh commented 1 month ago

tried updating tokenizer.py (alltalk_tts\system\ft_tokenizer). I manually updated the changes instead of replacing the file. tokenizer.txt

Referred to the following for above change https://github.com/coqui-ai/TTS/issues/3655

Still same issue.

erew123 commented 1 month ago

Hi @nitinmukesh

Are you specifically attempting to use "Streaming"? I cannot say if the Coqui engine ever supported streaming with Hindi. No reason it should but I dont know if it does.

Nonetheless, I tried some Devanagari script and it passed through fine on my PC. I also tried yours "नमस्ते! आज आप कैसे हैं?" and that passed through fine.

image

image

My system that I tested on is a fresh install (shown below) and has all the following package versions (you can run start_diagnostics to create a diagnostics.log file and compare versions on your system).

image

diagnostics.log contents as of 16th Sept 2024

PACKAGE VERSIONS vs REQUIREMENTS FILE:
 coqui-tts           Required: >= 0.24.1        Installed: 0.24.1
 faster-whisper      Required: >= 1.0.3         Installed: 1.0.3
 fuzzywuzzy          Required: >= 0.18.0        Installed: 0.18.0
 gradio              Required: >= 4.26.0        Installed: 4.32.2
 importlib_metadata  Required: >= 7.2.1         Installed: 8.5.0
 inputimeout         Required: >= 1.0.4         Installed: 1.0.4
 Jinja2              Required: >= 3.1.4         Installed: 3.1.4
 librosa             Required: >= 0.10.2.post1  Installed: 0.10.2.post1
 nvidia-cublas-cu11  Required: >= 11.11.3.6     Installed: 11.11.3.6
 nvidia-cudnn-cu11   Required: >= 9.1.1.17      Installed: 9.4.0.58
 onnxruntime-gpu     Required: >= 1.18.1        Installed: 1.19.2
 pydantic            Required: >= 2.8.2         Installed: 2.9.1
 python-ffmpeg       Required: >= 2.0.12        Installed: 2.0.12
 python-Levenshtein  Required: >= 0.25.1        Installed: 0.25.1
 praat-parselmouth   Required: >= 0.4.4         Installed: 0.4.4
 pyworld             Required: >= 0.3.4         Installed: 0.3.4
 sounddevice         Required: >= 0.4.7         Installed: 0.5.0
 soundfile           Required: >= 0.12.1        Installed: 0.12.1
 spacy               Required: >= 3.7.1         Installed: 3.7.6
 torchcrepe          Required: >= 0.0.2         Installed: 0.0.23
 tqdm                Required: >= 4.66.5        Installed: 4.66.5
 unidic-lite         Required: >= 1.0.8         Installed: 1.0.8
 uvicorn             Required: >= 0.29.0        Installed: 0.30.6
 pillow              Required: == 10.3.0        Installed: 10.3.0
 pypinyin            Required: >= 0.52.0        Installed: 0.53.0
 word2number         Required: >= 1.1           Installed: 1.1
 cutlet              Required: == 0.4.0         Installed: 0.4.0
 fugashi             Required: == 1.3.1         Installed: 1.3.1
 fastapi             Required: == 0.112.2       Installed: 0.112.2

PYTHON PACKAGES:
 absl-py>= 2.1.0
 aiofiles>= 23.2.1
 aiohappyeyeballs>= 2.4.0
 aiohttp>= 3.10.5
 aiosignal>= 1.3.1
 altair>= 5.4.1
 annotated-types>= 0.7.0
 antlr4-python3-runtime>= 4.9.3
 anyascii>= 0.3.2
 anyio>= 4.4.0
 argbind>= 0.3.9
 asttokens>= 2.4.1
 attrs>= 24.2.0
 audioread>= 3.0.1
 av>= 12.3.0
 babel>= 2.16.0
 bitarray>= 2.9.2
 blis>= 0.7.11
 Brotli>= 1.0.9
 catalogue>= 2.0.10
 certifi>= 2024.8.30
 cffi>= 1.17.1
 charset-normalizer>= 3.3.2
 click>= 8.1.7
 cloudpathlib>= 0.19.0
 colorama>= 0.4.6
 coloredlogs>= 15.0.1
 confection>= 0.1.5
 contourpy>= 1.3.0
 coqpit>= 0.0.17
 coqui-tts>= 0.24.1
 coqui-tts-trainer>= 0.1.5
 ctranslate2>= 4.4.0
 cutlet>= 0.4.0
 cycler>= 0.12.1
 cymem>= 2.0.8
 Cython>= 3.0.11
 dateparser>= 1.1.8
 decorator>= 5.1.1
 deepspeed>= 0.14.0+ce78a63
 descript-audiotools>= 0.7.2
 descript-audio-codec>= 1.0.0
 docopt>= 0.6.2
 docstring_parser>= 0.16
 einops>= 0.8.0
 encodec>= 0.1.1
 executing>= 2.1.0
 fairseq>= 0.12.4
 faiss>= 1.8.0
 fastapi>= 0.112.2
 faster-whisper>= 1.0.3
 ffmpy>= 0.4.0
 filelock>= 3.13.1
 fire>= 0.6.0
 flatbuffers>= 24.3.25
 flatten-dict>= 0.4.2
 fonttools>= 4.53.1
 frozenlist>= 1.4.1
 fsspec>= 2024.9.0
 fugashi>= 1.3.1
 future>= 1.0.0
 fuzzywuzzy>= 0.18.0
 gmpy2>= 2.1.2
 gradio>= 4.32.2
 gradio_client>= 0.17.0
 grpcio>= 1.66.1
 gruut>= 2.2.3
 gruut-ipa>= 0.13.0
 gruut_lang_de>= 2.0.1
 gruut_lang_en>= 2.0.1
 gruut_lang_es>= 2.0.1
 gruut_lang_fr>= 2.0.2
 h11>= 0.14.0
 hangul-romanize>= 0.1.0
 hjson>= 3.1.0
 httpcore>= 1.0.5
 httpx>= 0.27.2
 huggingface-hub>= 0.24.7
 humanfriendly>= 10.0
 hydra-core>= 1.3.2
 idna>= 3.7
 importlib_metadata>= 8.5.0
 importlib_resources>= 6.4.5
 inflect>= 7.4.0
 inputimeout>= 1.0.4
 ipython>= 8.27.0
 jaconv>= 0.4.0
 jedi>= 0.19.1
 Jinja2>= 3.1.4
 joblib>= 1.4.2
 jsonlines>= 1.2.0
 jsonschema>= 4.23.0
 jsonschema-specifications>= 2023.12.1
 julius>= 0.2.7
 kiwisolver>= 1.4.7
 langcodes>= 3.4.0
 language_data>= 1.2.0
 lazy_loader>= 0.4
 Levenshtein>= 0.25.1
 librosa>= 0.10.2.post1
 llvmlite>= 0.43.0
 local-attention>= 1.9.15
 lxml>= 5.3.0
 marisa-trie>= 1.2.0
 Markdown>= 3.7
 markdown2>= 2.5.0
 markdown-it-py>= 3.0.0
 MarkupSafe>= 2.1.3
 matplotlib>= 3.9.2
 matplotlib-inline>= 0.1.7
 mdurl>= 0.1.2
 mkl_fft>= 1.3.10
 mkl_random>= 1.2.7
 mkl-service>= 2.4.0
 mojimoji>= 0.0.13
 more-itertools>= 10.5.0
 mpmath>= 1.3.0
 msgpack>= 1.1.0
 multidict>= 6.1.0
 murmurhash>= 1.0.10
 narwhals>= 1.8.0
 networkx>= 2.8.8
 ninja>= 1.11.1.1
 num2words>= 0.5.13
 numba>= 0.60.0
 numpy>= 1.26.4
 nvidia-cublas-cu11>= 11.11.3.6
 nvidia-cuda-nvrtc-cu11>= 11.8.89
 nvidia-cudnn-cu11>= 9.4.0.58
 omegaconf>= 2.3.0
 onnxruntime>= 1.19.2
 onnxruntime-gpu>= 1.19.2
 orjson>= 3.10.7
 packaging>= 24.1
 pandas>= 2.2.2
 parler_tts>= 0.2
 parso>= 0.8.4
 pillow>= 10.3.0
 pip>= 24.2
 platformdirs>= 4.3.3
 pooch>= 1.8.2
 portalocker>= 2.10.1
 praat-parselmouth>= 0.4.4
 preshed>= 3.0.9
 prompt_toolkit>= 3.0.47
 protobuf>= 3.19.6
 psutil>= 6.0.0
 pure_eval>= 0.2.3
 pycparser>= 2.22
 pydantic>= 2.9.1
 pydantic_core>= 2.23.3
 pydub>= 0.25.1
 pyee>= 12.0.0
 Pygments>= 2.18.0
 pyloudnorm>= 0.1.1
 pynndescent>= 0.5.13
 pynvml>= 11.5.3
 pyparsing>= 3.1.4
 pypinyin>= 0.53.0
 pyreadline3>= 3.5.2
 pysbd>= 0.3.4
 PySocks>= 1.7.1
 pystoi>= 0.4.1
 python-crfsuite>= 0.9.10
 python-dateutil>= 2.9.0.post0
 python-ffmpeg>= 2.0.12
 python-Levenshtein>= 0.25.1
 python-multipart>= 0.0.9
 pytz>= 2024.2
 pywin32>= 306
 pyworld>= 0.3.4
 PyYAML>= 6.0.1
 py-cpuinfo>= 9.0.0
 randomname>= 0.2.1
 rapidfuzz>= 3.9.7
 referencing>= 0.35.1
 regex>= 2024.9.11
 requests>= 2.32.3
 resampy>= 0.4.3
 rich>= 13.8.1
 rotary-embedding-torch>= 0.8.3
 rpds-py>= 0.20.0
 ruff>= 0.6.5
 sacrebleu>= 2.4.3
 safetensors>= 0.4.5
 scikit-learn>= 1.5.2
 scipy>= 1.14.1
 semantic-version>= 2.10.0
 sentencepiece>= 0.2.0
 setuptools>= 72.1.0
 shellingham>= 1.5.4
 six>= 1.16.0
 smart-open>= 7.0.4
 sniffio>= 1.3.1
 sounddevice>= 0.5.0
 soundfile>= 0.12.1
 soxr>= 0.5.0.post1
 spacy>= 3.7.6
 spacy-legacy>= 3.0.12
 spacy-loggers>= 1.0.5
 srsly>= 2.4.8
 stack-data>= 0.6.3
 starlette>= 0.38.5
 SudachiDict-core>= 20240716
 SudachiPy>= 0.6.8
 sympy>= 1.13.2
 tabulate>= 0.9.0
 tensorboard>= 2.17.1
 tensorboard-data-server>= 0.7.2
 termcolor>= 2.4.0
 thinc>= 8.2.5
 threadpoolctl>= 3.5.0
 tokenizers>= 0.19.1
 tomlkit>= 0.12.0
 torch>= 2.2.1
 torchaudio>= 2.2.1
 torchcrepe>= 0.0.23
 torchvision>= 0.17.1
 torch-stoi>= 0.2.1
 tqdm>= 4.66.5
 traitlets>= 5.14.3
 transformers>= 4.40.2
 typeguard>= 4.3.0
 typer>= 0.12.5
 typing_extensions>= 4.11.0
 tzdata>= 2024.1
 tzlocal>= 5.2
 umap-learn>= 0.5.6
 unidic-lite>= 1.0.8
 urllib3>= 2.2.2
 uvicorn>= 0.30.6
 wasabi>= 1.1.3
 wcwidth>= 0.2.13
 weasel>= 0.4.1
 websockets>= 11.0.3
 Werkzeug>= 3.0.4
 wheel>= 0.44.0
 win-inet-pton>= 1.1.0
 word2number>= 1.1
 wrapt>= 1.16.0
 yarl>= 1.11.1
 zipp>= 3.20.2

Bar something not installing correctly on your system, I am not sure what would cause this issue. Though, perhaps there are some system locale type issues as those can cause issues with how letters are sometimes interpreted https://learn.microsoft.com/en-us/windows-hardware/customize/desktop/unattend/microsoft-windows-international-core-winpe-systemlocale

I suppose its possible that your system locale could cause an issue, but I would be unable to diagnose that for you.

I assume you have done a full fresh installation of AllTalk V2 and not just copied over a V1 installation? You may wish to re-try setting up the Python environment.

Finally, I am not the maintainer of the Coqui TTS engine, that is done by idiap and I can see they are working on additional Hindi support https://github.com/idiap/coqui-ai-TTS/commit/192032882272945bdeb1253b140074ec2bce7737 though that update is not yet available.

Thanks

erew123 commented 1 month ago

@nitinmukesh Further to this, while writing further documentation for V2 of AllTalk, I was looking over the V1 help and there is here https://github.com/erew123/alltalk_tts?tab=readme-ov-file#startup-performance-and-compatibility-issues

image

As such, you need to load the 2.0.3 model as API mode.

image

nitinmukesh commented 4 weeks ago

Thank you @erew123 I hope the support for Hindi language is added soon. I did tried everything reinstalling, etc.. but it didn't work.

Currently using Google TTS for Hindi.

I will try the api one as suggested by you. Appreciate your guidance

erew123 commented 4 weeks ago

@nitinmukesh As mentioned above, you CAN use Hindi, if you load the XTTS 2.0.3 model as apitts (API mode)

image

nitinmukesh commented 4 weeks ago

@erew123

I did understood it and mentioned the same in my earlier response.

I will try the api one as suggested by you. Appreciate your guidance

I should have mentioned apitts. Appreciate your guidance in making this work. Thank you