Closed nitinmukesh closed 1 month ago
Tried French and English, both working. Hindi Not working. Please help.
(C:\tut\alltalk_tts\alltalk_environment\env) C:\tut\alltalk_tts>start_alltalk.bat
[AllTalk TTS] _ _ _ _____ _ _ _____ _____ ____
[AllTalk TTS] / \ | | |_ _|_ _| | | __ |_ _|_ _/ ___|
[AllTalk TTS] / _ \ | | | | |/ _` | | |/ / | | | | \___ \
[AllTalk TTS] / ___ \| | | | | (_| | | < | | | | ___) |
[AllTalk TTS] /_/ \_\_|_| |_|\__,_|_|_|\_\ |_| |_| |____/
[AllTalk TTS]
[AllTalk TTS] Config file update: No Updates required
[AllTalk TTS] Start-up Mode : Standalone mode
[AllTalk TTS] WAV file deletion : Disabled
[AllTalk TTS] Github updated : 15th August 2024 at 08:27
[AllTalk ENG] Transcoding : ffmpeg found
[AllTalk ENG] DeepSpeed version : 0.14.0+ce78a63
[AllTalk ENG] Python Version : 3.11.0
[AllTalk ENG] PyTorch Version : 2.2.1+cu121
[AllTalk ENG] CUDA Version : 12.1
[AllTalk ENG]
[AllTalk ENG] Model/Engine : xttsv2_2.0.3 loading into cuda
[AllTalk ENG] Model License: https://coqui.ai/cpml.txt
[AllTalk ENG] Load time : 39.11 seconds.
[AllTalk TTS]
[AllTalk TTS] API Address : 127.0.0.1:7851
[AllTalk TTS] Gradio Light: http://127.0.0.1:7852
[AllTalk TTS] Gradio Dark : http://127.0.0.1:7852?__theme=dark
[AllTalk TTS]
[AllTalk TTS] Please use Ctrl+C when exiting AllTalk otherwise a
[AllTalk TTS] subprocess may continue running in the background.
[AllTalk TTS]
[AllTalk TTS] AllTalk Server Ready
[AllTalk GEN] Bonjour! comment vas-tu aujourd'hui?
C:\tut\alltalk_tts\alltalk_environment\env\Lib\site-packages\transformers\models\gpt2\modeling_gpt2.py:544: UserWarning: 1Torch was not compiled with flash attention. (Triggered internally at ..\aten\src\ATen\native\transformers\cuda\sdp_utils.cpp:263.)
attn_output = torch.nn.functional.scaled_dot_product_attention(
[AllTalk GEN] TTS Generate: 5.50 seconds. LowVRAM: False DeepSpeed: False
tried updating tokenizer.py (alltalk_tts\system\ft_tokenizer). I manually updated the changes instead of replacing the file. tokenizer.txt
Referred to the following for above change https://github.com/coqui-ai/TTS/issues/3655
Still same issue.
Hi @nitinmukesh
Are you specifically attempting to use "Streaming"? I cannot say if the Coqui engine ever supported streaming with Hindi. No reason it should but I dont know if it does.
Nonetheless, I tried some Devanagari script and it passed through fine on my PC. I also tried yours "नमस्ते! आज आप कैसे हैं?" and that passed through fine.
My system that I tested on is a fresh install (shown below) and has all the following package versions (you can run start_diagnostics
to create a diagnostics.log
file and compare versions on your system).
PACKAGE VERSIONS vs REQUIREMENTS FILE:
coqui-tts Required: >= 0.24.1 Installed: 0.24.1
faster-whisper Required: >= 1.0.3 Installed: 1.0.3
fuzzywuzzy Required: >= 0.18.0 Installed: 0.18.0
gradio Required: >= 4.26.0 Installed: 4.32.2
importlib_metadata Required: >= 7.2.1 Installed: 8.5.0
inputimeout Required: >= 1.0.4 Installed: 1.0.4
Jinja2 Required: >= 3.1.4 Installed: 3.1.4
librosa Required: >= 0.10.2.post1 Installed: 0.10.2.post1
nvidia-cublas-cu11 Required: >= 11.11.3.6 Installed: 11.11.3.6
nvidia-cudnn-cu11 Required: >= 9.1.1.17 Installed: 9.4.0.58
onnxruntime-gpu Required: >= 1.18.1 Installed: 1.19.2
pydantic Required: >= 2.8.2 Installed: 2.9.1
python-ffmpeg Required: >= 2.0.12 Installed: 2.0.12
python-Levenshtein Required: >= 0.25.1 Installed: 0.25.1
praat-parselmouth Required: >= 0.4.4 Installed: 0.4.4
pyworld Required: >= 0.3.4 Installed: 0.3.4
sounddevice Required: >= 0.4.7 Installed: 0.5.0
soundfile Required: >= 0.12.1 Installed: 0.12.1
spacy Required: >= 3.7.1 Installed: 3.7.6
torchcrepe Required: >= 0.0.2 Installed: 0.0.23
tqdm Required: >= 4.66.5 Installed: 4.66.5
unidic-lite Required: >= 1.0.8 Installed: 1.0.8
uvicorn Required: >= 0.29.0 Installed: 0.30.6
pillow Required: == 10.3.0 Installed: 10.3.0
pypinyin Required: >= 0.52.0 Installed: 0.53.0
word2number Required: >= 1.1 Installed: 1.1
cutlet Required: == 0.4.0 Installed: 0.4.0
fugashi Required: == 1.3.1 Installed: 1.3.1
fastapi Required: == 0.112.2 Installed: 0.112.2
PYTHON PACKAGES:
absl-py>= 2.1.0
aiofiles>= 23.2.1
aiohappyeyeballs>= 2.4.0
aiohttp>= 3.10.5
aiosignal>= 1.3.1
altair>= 5.4.1
annotated-types>= 0.7.0
antlr4-python3-runtime>= 4.9.3
anyascii>= 0.3.2
anyio>= 4.4.0
argbind>= 0.3.9
asttokens>= 2.4.1
attrs>= 24.2.0
audioread>= 3.0.1
av>= 12.3.0
babel>= 2.16.0
bitarray>= 2.9.2
blis>= 0.7.11
Brotli>= 1.0.9
catalogue>= 2.0.10
certifi>= 2024.8.30
cffi>= 1.17.1
charset-normalizer>= 3.3.2
click>= 8.1.7
cloudpathlib>= 0.19.0
colorama>= 0.4.6
coloredlogs>= 15.0.1
confection>= 0.1.5
contourpy>= 1.3.0
coqpit>= 0.0.17
coqui-tts>= 0.24.1
coqui-tts-trainer>= 0.1.5
ctranslate2>= 4.4.0
cutlet>= 0.4.0
cycler>= 0.12.1
cymem>= 2.0.8
Cython>= 3.0.11
dateparser>= 1.1.8
decorator>= 5.1.1
deepspeed>= 0.14.0+ce78a63
descript-audiotools>= 0.7.2
descript-audio-codec>= 1.0.0
docopt>= 0.6.2
docstring_parser>= 0.16
einops>= 0.8.0
encodec>= 0.1.1
executing>= 2.1.0
fairseq>= 0.12.4
faiss>= 1.8.0
fastapi>= 0.112.2
faster-whisper>= 1.0.3
ffmpy>= 0.4.0
filelock>= 3.13.1
fire>= 0.6.0
flatbuffers>= 24.3.25
flatten-dict>= 0.4.2
fonttools>= 4.53.1
frozenlist>= 1.4.1
fsspec>= 2024.9.0
fugashi>= 1.3.1
future>= 1.0.0
fuzzywuzzy>= 0.18.0
gmpy2>= 2.1.2
gradio>= 4.32.2
gradio_client>= 0.17.0
grpcio>= 1.66.1
gruut>= 2.2.3
gruut-ipa>= 0.13.0
gruut_lang_de>= 2.0.1
gruut_lang_en>= 2.0.1
gruut_lang_es>= 2.0.1
gruut_lang_fr>= 2.0.2
h11>= 0.14.0
hangul-romanize>= 0.1.0
hjson>= 3.1.0
httpcore>= 1.0.5
httpx>= 0.27.2
huggingface-hub>= 0.24.7
humanfriendly>= 10.0
hydra-core>= 1.3.2
idna>= 3.7
importlib_metadata>= 8.5.0
importlib_resources>= 6.4.5
inflect>= 7.4.0
inputimeout>= 1.0.4
ipython>= 8.27.0
jaconv>= 0.4.0
jedi>= 0.19.1
Jinja2>= 3.1.4
joblib>= 1.4.2
jsonlines>= 1.2.0
jsonschema>= 4.23.0
jsonschema-specifications>= 2023.12.1
julius>= 0.2.7
kiwisolver>= 1.4.7
langcodes>= 3.4.0
language_data>= 1.2.0
lazy_loader>= 0.4
Levenshtein>= 0.25.1
librosa>= 0.10.2.post1
llvmlite>= 0.43.0
local-attention>= 1.9.15
lxml>= 5.3.0
marisa-trie>= 1.2.0
Markdown>= 3.7
markdown2>= 2.5.0
markdown-it-py>= 3.0.0
MarkupSafe>= 2.1.3
matplotlib>= 3.9.2
matplotlib-inline>= 0.1.7
mdurl>= 0.1.2
mkl_fft>= 1.3.10
mkl_random>= 1.2.7
mkl-service>= 2.4.0
mojimoji>= 0.0.13
more-itertools>= 10.5.0
mpmath>= 1.3.0
msgpack>= 1.1.0
multidict>= 6.1.0
murmurhash>= 1.0.10
narwhals>= 1.8.0
networkx>= 2.8.8
ninja>= 1.11.1.1
num2words>= 0.5.13
numba>= 0.60.0
numpy>= 1.26.4
nvidia-cublas-cu11>= 11.11.3.6
nvidia-cuda-nvrtc-cu11>= 11.8.89
nvidia-cudnn-cu11>= 9.4.0.58
omegaconf>= 2.3.0
onnxruntime>= 1.19.2
onnxruntime-gpu>= 1.19.2
orjson>= 3.10.7
packaging>= 24.1
pandas>= 2.2.2
parler_tts>= 0.2
parso>= 0.8.4
pillow>= 10.3.0
pip>= 24.2
platformdirs>= 4.3.3
pooch>= 1.8.2
portalocker>= 2.10.1
praat-parselmouth>= 0.4.4
preshed>= 3.0.9
prompt_toolkit>= 3.0.47
protobuf>= 3.19.6
psutil>= 6.0.0
pure_eval>= 0.2.3
pycparser>= 2.22
pydantic>= 2.9.1
pydantic_core>= 2.23.3
pydub>= 0.25.1
pyee>= 12.0.0
Pygments>= 2.18.0
pyloudnorm>= 0.1.1
pynndescent>= 0.5.13
pynvml>= 11.5.3
pyparsing>= 3.1.4
pypinyin>= 0.53.0
pyreadline3>= 3.5.2
pysbd>= 0.3.4
PySocks>= 1.7.1
pystoi>= 0.4.1
python-crfsuite>= 0.9.10
python-dateutil>= 2.9.0.post0
python-ffmpeg>= 2.0.12
python-Levenshtein>= 0.25.1
python-multipart>= 0.0.9
pytz>= 2024.2
pywin32>= 306
pyworld>= 0.3.4
PyYAML>= 6.0.1
py-cpuinfo>= 9.0.0
randomname>= 0.2.1
rapidfuzz>= 3.9.7
referencing>= 0.35.1
regex>= 2024.9.11
requests>= 2.32.3
resampy>= 0.4.3
rich>= 13.8.1
rotary-embedding-torch>= 0.8.3
rpds-py>= 0.20.0
ruff>= 0.6.5
sacrebleu>= 2.4.3
safetensors>= 0.4.5
scikit-learn>= 1.5.2
scipy>= 1.14.1
semantic-version>= 2.10.0
sentencepiece>= 0.2.0
setuptools>= 72.1.0
shellingham>= 1.5.4
six>= 1.16.0
smart-open>= 7.0.4
sniffio>= 1.3.1
sounddevice>= 0.5.0
soundfile>= 0.12.1
soxr>= 0.5.0.post1
spacy>= 3.7.6
spacy-legacy>= 3.0.12
spacy-loggers>= 1.0.5
srsly>= 2.4.8
stack-data>= 0.6.3
starlette>= 0.38.5
SudachiDict-core>= 20240716
SudachiPy>= 0.6.8
sympy>= 1.13.2
tabulate>= 0.9.0
tensorboard>= 2.17.1
tensorboard-data-server>= 0.7.2
termcolor>= 2.4.0
thinc>= 8.2.5
threadpoolctl>= 3.5.0
tokenizers>= 0.19.1
tomlkit>= 0.12.0
torch>= 2.2.1
torchaudio>= 2.2.1
torchcrepe>= 0.0.23
torchvision>= 0.17.1
torch-stoi>= 0.2.1
tqdm>= 4.66.5
traitlets>= 5.14.3
transformers>= 4.40.2
typeguard>= 4.3.0
typer>= 0.12.5
typing_extensions>= 4.11.0
tzdata>= 2024.1
tzlocal>= 5.2
umap-learn>= 0.5.6
unidic-lite>= 1.0.8
urllib3>= 2.2.2
uvicorn>= 0.30.6
wasabi>= 1.1.3
wcwidth>= 0.2.13
weasel>= 0.4.1
websockets>= 11.0.3
Werkzeug>= 3.0.4
wheel>= 0.44.0
win-inet-pton>= 1.1.0
word2number>= 1.1
wrapt>= 1.16.0
yarl>= 1.11.1
zipp>= 3.20.2
Bar something not installing correctly on your system, I am not sure what would cause this issue. Though, perhaps there are some system locale type issues as those can cause issues with how letters are sometimes interpreted https://learn.microsoft.com/en-us/windows-hardware/customize/desktop/unattend/microsoft-windows-international-core-winpe-systemlocale
I suppose its possible that your system locale could cause an issue, but I would be unable to diagnose that for you.
I assume you have done a full fresh installation of AllTalk V2 and not just copied over a V1 installation? You may wish to re-try setting up the Python environment.
Finally, I am not the maintainer of the Coqui TTS engine, that is done by idiap and I can see they are working on additional Hindi support https://github.com/idiap/coqui-ai-TTS/commit/192032882272945bdeb1253b140074ec2bce7737 though that update is not yet available.
Thanks
@nitinmukesh Further to this, while writing further documentation for V2 of AllTalk, I was looking over the V1 help and there is here https://github.com/erew123/alltalk_tts?tab=readme-ov-file#startup-performance-and-compatibility-issues
As such, you need to load the 2.0.3 model as API mode.
Thank you @erew123 I hope the support for Hindi language is added soon. I did tried everything reinstalling, etc.. but it didn't work.
Currently using Google TTS for Hindi.
I will try the api one as suggested by you. Appreciate your guidance
@nitinmukesh As mentioned above, you CAN use Hindi, if you load the XTTS 2.0.3 model as apitts (API mode)
@erew123
I did understood it and mentioned the same in my earlier response.
I will try the api one as suggested by you. Appreciate your guidance
I should have mentioned apitts. Appreciate your guidance in making this work. Thank you
Describe the bug Trying to generate Hindi audio but it is coming up with error. English audio is successfully generated.
To Reproduce Launch the UI and enter Hindi text in Text Input. Select xtts - xttsv2_2.0.3. Under Advanced settings > Select Language as Hi. Click Generate TTS.
Screenshots N.A.
Text/logs
Desktop (please complete the following information): AllTalk was updated: [approx. date]: 15th August 2024 at 08:27 Custom Python environment: [yes/no give details if yes] No Text-generation-webUI was updated: [approx. date] It's beta version and upto date
Additional context