coqui-ai / TTS

🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production
http://coqui.ai
Mozilla Public License 2.0
35.85k stars 4.38k forks source link

AttributeError: 'NoneType' object has no attribute 'name_to_id' #3557

Closed byjlw closed 4 months ago

byjlw commented 10 months ago

Describe the bug

Getting missing attribute error when trying to use a vocoder. Things work fine when just using a TTS model


AttributeError Traceback (most recent call last) Cell In[2], line 34 32 # Convert text to speech and play the audio 33 text = "Hello, this is a test. How do you think i did?" # Replace with your desired text ---> 34 audio_file = text_to_speech_with_vocoder(text, tts_model_path, tts_config_path, vocoder_model_path, vocoder_config_path) 35 Audio(audio_file)

Cell In[2], line 27 25 def text_to_speech_with_vocoder(text, tts_model_path, tts_config_path, vocoder_model_path, vocoder_config_path, output_path='output.wav'): 26 synthesizer = Synthesizer(tts_model_path, tts_config_path, vocoder_model_path, vocoder_config_path, None) ---> 27 wav = synthesizer.tts(text) 28 synthesizer.save_wav(wav, output_path) 29 print(f"Audio output saved to {output_path}")

File ~/Documents/source/consumeViaAudio/.venv/lib/python3.11/site-packages/TTS/utils/synthesizer.py:319, in Synthesizer.tts(self, text, speaker_name, language_name, speaker_wav, style_wav, style_text, reference_wav, reference_speaker_name, split_sentences, **kwargs) 317 speaker_id = self.tts_model.speaker_manager.name_to_id[speaker_name] 318 # handle Neon models with single speaker. --> 319 elif len(self.tts_model.speaker_manager.name_to_id) == 1: 320 speaker_id = list(self.tts_model.speaker_manager.name_to_id.values())[0] 321 elif not speaker_name and not speaker_wav:

To Reproduce

Run this code

Import necessary libraries

from TTS.utils.synthesizer import Synthesizer from IPython.display import Audio

Download and load the TTS and Vocoder models

from TTS.utils.manage import ModelManager

TTS model

manager = ModelManager() tts_model_name = "tts_models/en/ljspeech/fast_pitch" tts_model_path, tts_config_path, tts_model_item = manager.download_model(tts_model_name) print(f"model path{tts_model_path}") print(f"Model SettingsPath {tts_config_path}")

Vocoder model

vocoder_model_name = "vocoder_models/en/ljspeech/hifigan_v2" vocoder_model_path, vocoder_config_path, v_model_item = manager.download_model(vocoder_model_name) print(f"coder path{vocoder_model_path}") print(f"coder SettingsPath {vocoder_config_path}")

Define the text-to-speech function with Vocoder

def text_to_speech_with_vocoder(text, tts_model_path, tts_config_path, vocoder_model_path, vocoder_config_path, output_path='output.wav'): synthesizer = Synthesizer(tts_model_path, tts_config_path, vocoder_model_path, vocoder_config_path, None) wav = synthesizer.tts(text) synthesizer.save_wav(wav, output_path) print(f"Audio output saved to {output_path}") return output_path

Convert text to speech and play the audio

text = "Hello, this is a test. How do you think i did?" # Replace with your desired text audio_file = text_to_speech_with_vocoder(text, tts_model_path, tts_config_path, vocoder_model_path, vocoder_config_path) Audio(audio_file)

Expected behavior

I end up with a wav filef

Logs

No response

Environment

{
    "CUDA": {
        "GPU": [],
        "available": false,
        "version": null
    },
    "Packages": {
        "PyTorch_debug": false,
        "PyTorch_version": "2.1.1",
        "TTS": "0.22.0",
        "numpy": "1.26.3"
    },
    "System": {
        "OS": "Darwin",
        "architecture": [
            "64bit",
            ""
        ],
        "processor": "i386",
        "python": "3.11.7",
        "version": "Darwin Kernel Version 23.1.0: Mon Oct  9 21:33:00 PDT 2023; root:xnu-10002.41.9~7/RELEASE_ARM64_T6031"
    }
}

Additional context

No response

nellorebhanuteja commented 10 months ago

Similar error for me when trying to infer using VITS model

AttributeError: 'TTS' object has no attribute 'is_multi_lingual'

stale[bot] commented 8 months ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. You might also look our discussion channels.

jlog3 commented 6 months ago

(User error, not repo bug) The issue arises from using positional arguments when initializing the Synthesizer() instance, which can lead to incorrect parameter assignments due to the expected order of arguments in the constructor. Here's the declaration from the source:

class Synthesizer(nn.Module):
    def __init__(
        self,
        tts_checkpoint: str = "",
        tts_config_path: str = "",
        tts_speakers_file: str = "",
        tts_languages_file: str = "",
        vocoder_checkpoint: str = "",
        vocoder_config: str = "",
        ...)

To avoid confusion and ensure that each parameter is correctly assigned, you should use keyword arguments, especially for specifying vocoder settings. Here is how to correctly initialize the Synthesizer using keyword arguments:

from TTS.utils.synthesizer import Synthesizer

# Correct way to initialize the Synthesizer with keyword arguments
synthesizer = Synthesizer(
    tts_checkpoint="path_to_your_tts_model.pth",
    tts_config_path="path_to_your_tts_config.json",
    vocoder_checkpoint="path_to_your_vocoder_model.pth",
    vocoder_config="path_to_your_vocoder_config.json"
)
stale[bot] commented 4 months ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. You might also look our discussion channels.