DigitalPhonetics / IMS-Toucan

Controllable and fast Text-to-Speech for over 7000 languages!
Apache License 2.0
1.42k stars 159 forks source link

weird conflict EspeakBackend and parselmouth #85

Closed Brugio96 closed 4 months ago

Brugio96 commented 1 year ago

Hi, and first of all thanks for your incredible work. I lost all afternoon trying to run run_utterance_cloner.py, I'm on Amazon Linux 2. When trying to execute it, I've got this error 'RuntimeError: language "en-us" is not supported by the espeak backend', which was kinda strange since I know for a fact that is supported and it also worked when I ran run_interactive_demo.py. So, I evaluated EspeakBackend.supported_languages() from various checkpoints along the execution path and I noticed that it returns an empty dictionary only after importing parselmouth in the PitchCalculator script. If I remove the import, of course it throws me another error afterwards, but the 'RuntimeError: language "en-us" is not supported by the espeak backend' no longer arises. I couldn't find anything useful online as for known conflicts between espeak and parselmouth, but I've read that there was a similar error in the closed issues of this repo.

Flux9665 commented 1 year ago

very interesting, thank you for sharing that observation! I'll try to see if I can reproduce it and figure out why this is the case. I also heard a report just a few days ago that the error occured when they used a conda virtual environment, but the exact same code worked flawlessly when they used a pip virtual environment.

Both the phonemizer library and the parselmouth library don't run pure python code, they are python interfaces to other programs that run outside of python. So this similarity probably has something to do with it, but for me it always just worked, so it's hard to debug and find a solution. It's probably very system dependent.

Brugio96 commented 1 year ago

Yes, very strange indeed.

Anyway, I solved it by moving the import parselmouth inside of the _calculate_f0 method of the Parselmouth class [in TrainingInterfaces/Text_to_Spectrogram)/FastSpeech2/PitchCalculator.py] I also moved self.tts.set_language(lang) before

duration, pitch, energy, silence_frames_start, silence_frames_end = self.extract_prosody(reference_transcription, path_to_reference_audio, lang=lang)

in the clone_utterance method of the UtteranceCloner class [in InferenceInterfaces/UtteranceCloner.py].

Don't know if that's the correct way to go, but it now works.

Brugio96 commented 1 year ago

Actually it resolves the problem only when running run_utterance_cloner.py. If I run the run_training_pipeline.py script it raises the error each time that it finish to compute the cache data for each dataset. I still manage to train by re-running the script until it finishes with the cache folders computation, then training starts and ran with no problem.

thoraxe commented 1 year ago

I am also getting this error when using CentOS Streams 9.

python run_training_pipeline.py --gpu_id 0 crewchief_jim
torchvision is not available - cannot save figures
Preparing
... building dataset cache ...
Process Process-2:
Traceback (most recent call last):
  File "/usr/lib64/python3.9/multiprocessing/process.py", line 315, in _bootstrap
    self.run()
  File "/usr/lib64/python3.9/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/opt/app-root/src/IMS-Toucan/TrainingInterfaces/Text_to_Spectrogram/AutoAligner/AlignerDataset.py", line 132, in cache_builder_process
    tf = ArticulatoryCombinedTextFrontend(language=lang)
  File "/opt/app-root/src/IMS-Toucan/Preprocessing/TextFrontend.py", line 169, in __init__
    self.phonemizer_backend = EspeakBackend(language=self.g2p_lang,
  File "/opt/app-root/lib64/python3.9/site-packages/phonemizer/backend/espeak/espeak.py", line 45, in __init__
    super().__init__(
  File "/opt/app-root/lib64/python3.9/site-packages/phonemizer/backend/espeak/base.py", line 39, in __init__
    super().__init__(
  File "/opt/app-root/lib64/python3.9/site-packages/phonemizer/backend/base.py", line 86, in __init__
    self._language = self._init_language(language)
  File "/opt/app-root/lib64/python3.9/site-packages/phonemizer/backend/base.py", line 100, in _init_language
    raise RuntimeError(
RuntimeError: language "en-us" is not supported by the espeak backend
Process Process-3:
Traceback (most recent call last):
  File "/usr/lib64/python3.9/multiprocessing/process.py", line 315, in _bootstrap
    self.run()
  File "/usr/lib64/python3.9/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/opt/app-root/src/IMS-Toucan/TrainingInterfaces/Text_to_Spectrogram/AutoAligner/AlignerDataset.py", line 132, in cache_builder_process
    tf = ArticulatoryCombinedTextFrontend(language=lang)
  File "/opt/app-root/src/IMS-Toucan/Preprocessing/TextFrontend.py", line 169, in __init__
    self.phonemizer_backend = EspeakBackend(language=self.g2p_lang,
  File "/opt/app-root/lib64/python3.9/site-packages/phonemizer/backend/espeak/espeak.py", line 45, in __init__
    super().__init__(
  File "/opt/app-root/lib64/python3.9/site-packages/phonemizer/backend/espeak/base.py", line 39, in __init__
    super().__init__(
  File "/opt/app-root/lib64/python3.9/site-packages/phonemizer/backend/base.py", line 86, in __init__
    self._language = self._init_language(language)
  File "/opt/app-root/lib64/python3.9/site-packages/phonemizer/backend/base.py", line 100, in _init_language
    raise RuntimeError(
RuntimeError: language "en-us" is not supported by the espeak backend
Process Process-4:
Traceback (most recent call last):
  File "/usr/lib64/python3.9/multiprocessing/process.py", line 315, in _bootstrap
    self.run()
  File "/usr/lib64/python3.9/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/opt/app-root/src/IMS-Toucan/TrainingInterfaces/Text_to_Spectrogram/AutoAligner/AlignerDataset.py", line 132, in cache_builder_process
    tf = ArticulatoryCombinedTextFrontend(language=lang)
  File "/opt/app-root/src/IMS-Toucan/Preprocessing/TextFrontend.py", line 169, in __init__
    self.phonemizer_backend = EspeakBackend(language=self.g2p_lang,
  File "/opt/app-root/lib64/python3.9/site-packages/phonemizer/backend/espeak/espeak.py", line 45, in __init__
    super().__init__(
  File "/opt/app-root/lib64/python3.9/site-packages/phonemizer/backend/espeak/base.py", line 39, in __init__
    super().__init__(
  File "/opt/app-root/lib64/python3.9/site-packages/phonemizer/backend/base.py", line 86, in __init__
    self._language = self._init_language(language)
  File "/opt/app-root/lib64/python3.9/site-packages/phonemizer/backend/base.py", line 100, in _init_language
    raise RuntimeError(
RuntimeError: language "en-us" is not supported by the espeak backend
Process Process-5:
Traceback (most recent call last):
  File "/usr/lib64/python3.9/multiprocessing/process.py", line 315, in _bootstrap
    self.run()
  File "/usr/lib64/python3.9/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/opt/app-root/src/IMS-Toucan/TrainingInterfaces/Text_to_Spectrogram/AutoAligner/AlignerDataset.py", line 132, in cache_builder_process
    tf = ArticulatoryCombinedTextFrontend(language=lang)
  File "/opt/app-root/src/IMS-Toucan/Preprocessing/TextFrontend.py", line 169, in __init__
    self.phonemizer_backend = EspeakBackend(language=self.g2p_lang,
  File "/opt/app-root/lib64/python3.9/site-packages/phonemizer/backend/espeak/espeak.py", line 45, in __init__
    super().__init__(
  File "/opt/app-root/lib64/python3.9/site-packages/phonemizer/backend/espeak/base.py", line 39, in __init__
    super().__init__(
  File "/opt/app-root/lib64/python3.9/site-packages/phonemizer/backend/base.py", line 86, in __init__
    self._language = self._init_language(language)
  File "/opt/app-root/lib64/python3.9/site-packages/phonemizer/backend/base.py", line 100, in _init_language
    raise RuntimeError(
RuntimeError: language "en-us" is not supported by the espeak backend
Converting into convenient format...
0it [00:00, ?it/s]
0it [00:00, ?it/s]
Traceback (most recent call last):
  File "/opt/app-root/src/IMS-Toucan/run_training_pipeline.py", line 73, in <module>
    pipeline_dict[args.pipeline](gpu_id=args.gpu_id,
  File "/opt/app-root/src/IMS-Toucan/TrainingInterfaces/TrainingPipelines/finetune_crewchief.py", line 48, in run
    english_datasets.append(prepare_fastspeech_corpus(transcript_dict=build_path_to_transcript_dict_generic_ljspeech("../CrewChiefV4/CrewChiefV4/sounds/"),
  File "/opt/app-root/src/IMS-Toucan/Utility/corpus_preparation.py", line 40, in prepare_fastspeech_corpus
    aligner_datapoints = AlignerDataset(transcript_dict, cache_dir=corpus_dir, lang=lang, phone_input=phone_input, device=torch.device("cuda"))
  File "/opt/app-root/src/IMS-Toucan/TrainingInterfaces/Text_to_Spectrogram/AutoAligner/AlignerDataset.py", line 109, in __init__
    raise RuntimeError
RuntimeError

The above happens when using v2.4

thoraxe commented 1 year ago

sorry, my errors might be spurious. I do see the espeak problem with v24 but I also had path errors so some of the above may be junk.