DigitalPhonetics / IMS-Toucan

Multilingual and Controllable Text-to-Speech Toolkit of the Speech and Language Technologies Group at the University of Stuttgart.
Apache License 2.0
1.17k stars 135 forks source link

v2.4: RuntimeError: language "en-us" is not supported by the espeak backend #138

Closed thoraxe closed 11 months ago

thoraxe commented 1 year ago

ToucanTTS v2.5 was creating very poor audio after fine-tuning on a new dataset I created. I had luck with v2.4, so I decided to try to revert. Now I am getting the following error:

 python run_training_pipeline.py --gpu_id 0 jim
torchvision is not available - cannot save figures
Preparing
Prepared a FastSpeech dataset with 2829 datapoints in Corpora/Jim.
Training model
/opt/app-root/lib64/python3.9/site-packages/torch/utils/data/dataloader.py:554: UserWarning: This DataLoader will create 12 worker processes in total. Our suggested max number of worker in current system is 4, which is smaller than what this DataLoader is going to create. Please be aware that excessive worker creation might get DataLoader running slow or even freeze, lower the worker number to avoid potential slowness/freeze if necessary.
  warnings.warn(_create_warning_msg(
Traceback (most recent call last):
  File "/opt/app-root/src/IMS-Toucan/run_training_pipeline.py", line 73, in <module>
    pipeline_dict[args.pipeline](gpu_id=args.gpu_id,
  File "/opt/app-root/src/IMS-Toucan/TrainingInterfaces/TrainingPipelines/finetune_crewchief_porta.py", line 60, in run
    train_loop(net=model,
  File "/opt/app-root/src/IMS-Toucan/TrainingInterfaces/Text_to_Spectrogram/PortaSpeech/portaspeech_train_loop_arbiter.py", line 53, in train_loop
    mono_language_loop(net=net,
  File "/opt/app-root/src/IMS-Toucan/TrainingInterfaces/Text_to_Spectrogram/PortaSpeech/portaspeech_train_loop.py", line 233, in train_loop
    path_to_most_recent_plot_after = plot_progress_spec(net,
  File "/opt/app-root/lib64/python3.9/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "/opt/app-root/src/IMS-Toucan/Utility/utils.py", line 49, in plot_progress_spec
    tf = ArticulatoryCombinedTextFrontend(language=lang)
  File "/opt/app-root/src/IMS-Toucan/Preprocessing/TextFrontend.py", line 169, in __init__
    self.phonemizer_backend = EspeakBackend(language=self.g2p_lang,
  File "/opt/app-root/lib64/python3.9/site-packages/phonemizer/backend/espeak/espeak.py", line 45, in __init__
    super().__init__(
  File "/opt/app-root/lib64/python3.9/site-packages/phonemizer/backend/espeak/base.py", line 39, in __init__
    super().__init__(
  File "/opt/app-root/lib64/python3.9/site-packages/phonemizer/backend/base.py", line 86, in __init__
    self._language = self._init_language(language)
  File "/opt/app-root/lib64/python3.9/site-packages/phonemizer/backend/base.py", line 100, in _init_language
    raise RuntimeError(
RuntimeError: language "en-us" is not supported by the espeak backend
thoraxe commented 1 year ago

This may be related to https://github.com/DigitalPhonetics/IMS-Toucan/issues/85 except that it is happening for me when attempting to fine-tune.

thoraxe commented 1 year ago

In TextFrontend.py the offending code seems to be here:

        if language == "en":
            self.g2p_lang = "en"
            self.expand_abbreviations = english_text_expansion
            if not silent:
                print("Created an English Text-Frontend")

Note that the original code is en-us but, even when changed to en, it still reports "not supported by backend", although that actually isn't supported by the back-end.

thoraxe commented 1 year ago

phonemize appears to be working on the system in question:

echo "hello world" | phonemize -l en-us -b espeak
həloʊ wɜːld 
thoraxe commented 1 year ago

FWIW I did not have this problem with v2.5, but the fine-tuning output from 2.5 was unusable with this dataset.

thoraxe commented 1 year ago

OK, I did some further digging. I went to my other system, which is Ubuntu and WSL, and I was able to run things just fine with commit e41e266ccacf282a9854d562f9e3d604f1cf245b on that system. So I have been able to get training to start on two different datasets on that system.

I went to the new system and checked out that commit, reverted changes for Portaspeech, and tried to run the fine-tuning example. At first it didn't work, but then I blew away both the corpora and models folders, re-downloaded the models, and tried again, and it seems to be working at the moment with that commit.

I'm going to let things run for a while and I'll see how it goes and report back.

thoraxe commented 1 year ago

OK, I'm not really sure what was going on, but this problem seems to have resolved itself. However, I did upgrade to torch/torchaudio2.

I'm going to leave this open for now and try again with a completely fresh environment sometime soon.

thoraxe commented 1 year ago

I finally came back to this and was able to do some more testing. It appears that the torch 1.x line does not work in environments with newer CUDA, or that's my hypothesis. Installing the latest torch things seemed to do the trick:

pip install torch torchvision torchaudio

This resulted in:

alias-free-torch                  0.0.6
torch                             2.0.1
torch-complex                     0.4.3
torchaudio                        2.0.2
torchvision                       0.15.2

Training then runs as expected.

Flux9665 commented 11 months ago

Cool, thanks for your update!