NVIDIA / NeMo

A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)
https://docs.nvidia.com/nemo-framework/user-guide/latest/overview.html
Apache License 2.0
11.45k stars 2.39k forks source link

[TTS][German] hydra.errors.InstantiationException: Error locating target 'nemo.collections.tts.torch.tts_tokenizers.GermanPhonemesTokenizer', see chained exception above. #4828

Closed eqikkwkp25-cyber closed 2 years ago

eqikkwkp25-cyber commented 2 years ago

Describe the bug

German fastpitch model refers to non existing tokenizer

Steps/Code to reproduce bug

git clone https://github.com/NVIDIA/NeMo/ cd NeMo python3.8 -m venv venv source venv/bin/activate pip3 install a-lot-of-missing-requirements when executing python3 tts_german.py where inference code is as follows tts_german.py

import soundfile as sf
from nemo.collections.tts.models.base import SpectrogramGenerator, Vocoder

# Download and load the pretrained fastpitch model
spec_generator = SpectrogramGenerator.from_pretrained(model_name="tts_de_fastpitch_multispeaker_5")#.cuda()
# Download and load the pretrained hifigan model
vocoder = Vocoder.from_pretrained(model_name="tts_de_hui_hifigan_ft_fastpitch_multispeaker_5")#.cuda()

# All spectrogram generators start by parsing raw strings to a tokenized version of the string
parsed = spec_generator.parse("Hallo, magst du warme Milch trinken.")
# They then take the tokenized string and produce a spectrogram
spectrogram = spec_generator.generate_spectrogram(tokens=parsed)
# Finally, a vocoder converts the spectrogram to audio
audio = vocoder.convert_spectrogram_to_audio(spec=spectrogram)

# Save the audio to disk in a file called speech.wav
# Note vocoder return a batch of audio. In this example, we just take the first and only sample.
sf.write("speech.wav", audio.to('cpu').detach().numpy()[0], 44100)

Expected behavior

Generation of speech.wav

Environment overview (please complete the following information)

See above

Environment details

See above

Add any other context about the problem here. Error message:

(venv) userxyz@linux-u01v:~/devel/tts/NeMo> python3 tts_german.py 
[NeMo W 2022-08-28 17:44:54 optimizers:55] Apex was not found. Using the lamb or fused_adam optimizer will error out.
[NeMo W 2022-08-28 17:44:54 experimental:27] Module <class 'nemo_text_processing.g2p.modules.IPAG2P'> is experimental, not ready for production and is not fully supported. Use at your own risk.
[NeMo W 2022-08-28 17:44:54 experimental:27] Module <class 'nemo.collections.common.tokenizers.text_to_speech.tts_tokenizers.IPATokenizer'> is experimental, not ready for production and is not fully supported. Use at your own risk.
[NeMo W 2022-08-28 17:44:54 nemo_logging:349] /home/userxyz/devel/tts/NeMo/venv/lib64/python3.8/site-packages/torch/amp/autocast_mode.py:198: UserWarning: User provided device_type of 'cuda', but CUDA is not available. Disabling
      warnings.warn('User provided device_type of \'cuda\', but CUDA is not available. Disabling')

[NeMo W 2022-08-28 17:44:54 experimental:27] Module <class 'nemo.collections.tts.models.radtts.RadTTSModel'> is experimental, not ready for production and is not fully supported. Use at your own risk.
[NeMo I 2022-08-28 17:44:54 cloud:56] Found existing object /home/userxyz/.cache/torch/NeMo/NeMo_1.12.0rc0/tts_de_fastpitch_multispeaker_5/cd9aa375555376d59f140d3fb4b23fb2/tts_de_fastpitch_multispeaker_5.nemo.
[NeMo I 2022-08-28 17:44:54 cloud:62] Re-using file from: /home/userxyz/.cache/torch/NeMo/NeMo_1.12.0rc0/tts_de_fastpitch_multispeaker_5/cd9aa375555376d59f140d3fb4b23fb2/tts_de_fastpitch_multispeaker_5.nemo
[NeMo I 2022-08-28 17:44:54 common:910] Instantiating model from pre-trained checkpoint
[NeMo I 2022-08-28 17:44:55 tokenize_and_classify:81] Creating ClassifyFst grammars. This might take some time...
[NeMo I 2022-08-28 17:45:14 tokenize_and_classify:81] Creating ClassifyFst grammars. This might take some time...
[NeMo E 2022-08-28 17:45:33 common:503] Model instantiation failed!
    Target class:       nemo.collections.tts.models.fastpitch.FastPitchModel
    Error(s):   Error locating target 'nemo.collections.tts.torch.tts_tokenizers.GermanPhonemesTokenizer', see chained exception above.
    full_key: text_tokenizer
    Traceback (most recent call last):
      File "/home/userxyz/devel/tts/NeMo/venv/lib64/python3.8/site-packages/hydra/_internal/utils.py", line 645, in _locate
        obj = import_module(mod)
      File "/usr/lib64/python3.8/importlib/__init__.py", line 127, in import_module
        return _bootstrap._gcd_import(name[level:], package, level)
      File "<frozen importlib._bootstrap>", line 1014, in _gcd_import
      File "<frozen importlib._bootstrap>", line 991, in _find_and_load
      File "<frozen importlib._bootstrap>", line 970, in _find_and_load_unlocked
    ModuleNotFoundError: No module named 'nemo.collections.tts.torch.tts_tokenizers.GermanPhonemesTokenizer'; 'nemo.collections.tts.torch.tts_tokenizers' is not a package

    The above exception was the direct cause of the following exception:

    Traceback (most recent call last):
      File "/home/userxyz/devel/tts/NeMo/venv/lib64/python3.8/site-packages/hydra/_internal/instantiate/_instantiate2.py", line 134, in _resolve_target
        target = _locate(target)
      File "/home/userxyz/devel/tts/NeMo/venv/lib64/python3.8/site-packages/hydra/_internal/utils.py", line 648, in _locate
        raise ImportError(
    ImportError: Error loading 'nemo.collections.tts.torch.tts_tokenizers.GermanPhonemesTokenizer':
    ModuleNotFoundError("No module named 'nemo.collections.tts.torch.tts_tokenizers.GermanPhonemesTokenizer'; 'nemo.collections.tts.torch.tts_tokenizers' is not a package")
    Are you sure that 'GermanPhonemesTokenizer' is importable from module 'nemo.collections.tts.torch.tts_tokenizers'?

    The above exception was the direct cause of the following exception:

    Traceback (most recent call last):
      File "/home/userxyz/devel/tts/NeMo/nemo/core/classes/common.py", line 482, in from_config_dict
        instance = imported_cls(cfg=config, trainer=trainer)
      File "/home/userxyz/devel/tts/NeMo/nemo/collections/tts/models/fastpitch.py", line 95, in __init__
        self._setup_tokenizer(cfg)
      File "/home/userxyz/devel/tts/NeMo/nemo/collections/tts/models/fastpitch.py", line 199, in _setup_tokenizer
        self.vocab = instantiate(cfg.text_tokenizer, **text_tokenizer_kwargs)
      File "/home/userxyz/devel/tts/NeMo/venv/lib64/python3.8/site-packages/hydra/_internal/instantiate/_instantiate2.py", line 222, in instantiate
        return instantiate_node(
      File "/home/userxyz/devel/tts/NeMo/venv/lib64/python3.8/site-packages/hydra/_internal/instantiate/_instantiate2.py", line 325, in instantiate_node
        _target_ = _resolve_target(node.get(_Keys.TARGET), full_key)
      File "/home/userxyz/devel/tts/NeMo/venv/lib64/python3.8/site-packages/hydra/_internal/instantiate/_instantiate2.py", line 139, in _resolve_target
        raise InstantiationException(msg) from e
    hydra.errors.InstantiationException: Error locating target 'nemo.collections.tts.torch.tts_tokenizers.GermanPhonemesTokenizer', see chained exception above.
    full_key: text_tokenizer

Traceback (most recent call last):
  File "/home/userxyz/devel/tts/NeMo/venv/lib64/python3.8/site-packages/hydra/_internal/utils.py", line 645, in _locate
    obj = import_module(mod)
  File "/usr/lib64/python3.8/importlib/__init__.py", line 127, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1014, in _gcd_import
  File "<frozen importlib._bootstrap>", line 991, in _find_and_load
  File "<frozen importlib._bootstrap>", line 970, in _find_and_load_unlocked
ModuleNotFoundError: No module named 'nemo.collections.tts.torch.tts_tokenizers.GermanPhonemesTokenizer'; 'nemo.collections.tts.torch.tts_tokenizers' is not a package

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/userxyz/devel/tts/NeMo/venv/lib64/python3.8/site-packages/hydra/_internal/instantiate/_instantiate2.py", line 134, in _resolve_target
    target = _locate(target)
  File "/home/userxyz/devel/tts/NeMo/venv/lib64/python3.8/site-packages/hydra/_internal/utils.py", line 648, in _locate
    raise ImportError(
ImportError: Error loading 'nemo.collections.tts.torch.tts_tokenizers.GermanPhonemesTokenizer':
ModuleNotFoundError("No module named 'nemo.collections.tts.torch.tts_tokenizers.GermanPhonemesTokenizer'; 'nemo.collections.tts.torch.tts_tokenizers' is not a package")
Are you sure that 'GermanPhonemesTokenizer' is importable from module 'nemo.collections.tts.torch.tts_tokenizers'?

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "tts_german.py", line 5, in <module>
    spec_generator = SpectrogramGenerator.from_pretrained(model_name="tts_de_fastpitch_multispeaker_5")#.cuda()
  File "/home/userxyz/devel/tts/NeMo/nemo/core/classes/common.py", line 849, in from_pretrained
    instance = class_.restore_from(
  File "/home/userxyz/devel/tts/NeMo/nemo/core/classes/modelPT.py", line 311, in restore_from
    instance = cls._save_restore_connector.restore_from(
  File "/home/userxyz/devel/tts/NeMo/nemo/core/connectors/save_restore_connector.py", line 235, in restore_from
    loaded_params = self.load_config_and_state_dict(
  File "/home/userxyz/devel/tts/NeMo/nemo/core/connectors/save_restore_connector.py", line 158, in load_config_and_state_dict
    instance = calling_cls.from_config_dict(config=conf, trainer=trainer)
  File "/home/userxyz/devel/tts/NeMo/nemo/core/classes/common.py", line 504, in from_config_dict
    raise e
  File "/home/userxyz/devel/tts/NeMo/nemo/core/classes/common.py", line 496, in from_config_dict
    instance = cls(cfg=config, trainer=trainer)
  File "/home/userxyz/devel/tts/NeMo/nemo/collections/tts/models/fastpitch.py", line 95, in __init__
    self._setup_tokenizer(cfg)
  File "/home/userxyz/devel/tts/NeMo/nemo/collections/tts/models/fastpitch.py", line 199, in _setup_tokenizer
    self.vocab = instantiate(cfg.text_tokenizer, **text_tokenizer_kwargs)
  File "/home/userxyz/devel/tts/NeMo/venv/lib64/python3.8/site-packages/hydra/_internal/instantiate/_instantiate2.py", line 222, in instantiate
    return instantiate_node(
  File "/home/userxyz/devel/tts/NeMo/venv/lib64/python3.8/site-packages/hydra/_internal/instantiate/_instantiate2.py", line 325, in instantiate_node
    _target_ = _resolve_target(node.get(_Keys.TARGET), full_key)
  File "/home/userxyz/devel/tts/NeMo/venv/lib64/python3.8/site-packages/hydra/_internal/instantiate/_instantiate2.py", line 139, in _resolve_target
    raise InstantiationException(msg) from e
hydra.errors.InstantiationException: Error locating target 'nemo.collections.tts.torch.tts_tokenizers.GermanPhonemesTokenizer', see chained exception above.
full_key: text_tokenizer
eqikkwkp25-cyber commented 2 years ago

Using unreleased branch r1.11.0 solves above issue. Inference results are not sufficient yet, i.e. German single speaker Thorsten sound like a psycho alien and interference RTF is far away from 1 on CPU. I will close this one.

XuesongYang commented 1 year ago

@eqikkwkp25-cyber please refer to https://github.com/NVIDIA/NeMo/issues/4868. It is all about sampling rate.