Closed byjlw closed 4 months ago
Similar error for me when trying to infer using VITS model
AttributeError: 'TTS' object has no attribute 'is_multi_lingual'
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. You might also look our discussion channels.
(User error, not repo bug) The issue arises from using positional arguments when initializing the Synthesizer() instance, which can lead to incorrect parameter assignments due to the expected order of arguments in the constructor. Here's the declaration from the source:
class Synthesizer(nn.Module):
def __init__(
self,
tts_checkpoint: str = "",
tts_config_path: str = "",
tts_speakers_file: str = "",
tts_languages_file: str = "",
vocoder_checkpoint: str = "",
vocoder_config: str = "",
...)
To avoid confusion and ensure that each parameter is correctly assigned, you should use keyword arguments, especially for specifying vocoder settings. Here is how to correctly initialize the Synthesizer using keyword arguments:
from TTS.utils.synthesizer import Synthesizer
# Correct way to initialize the Synthesizer with keyword arguments
synthesizer = Synthesizer(
tts_checkpoint="path_to_your_tts_model.pth",
tts_config_path="path_to_your_tts_config.json",
vocoder_checkpoint="path_to_your_vocoder_model.pth",
vocoder_config="path_to_your_vocoder_config.json"
)
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. You might also look our discussion channels.
Describe the bug
Getting missing attribute error when trying to use a vocoder. Things work fine when just using a TTS model
AttributeError Traceback (most recent call last) Cell In[2], line 34 32 # Convert text to speech and play the audio 33 text = "Hello, this is a test. How do you think i did?" # Replace with your desired text ---> 34 audio_file = text_to_speech_with_vocoder(text, tts_model_path, tts_config_path, vocoder_model_path, vocoder_config_path) 35 Audio(audio_file)
Cell In[2], line 27 25 def text_to_speech_with_vocoder(text, tts_model_path, tts_config_path, vocoder_model_path, vocoder_config_path, output_path='output.wav'): 26 synthesizer = Synthesizer(tts_model_path, tts_config_path, vocoder_model_path, vocoder_config_path, None) ---> 27 wav = synthesizer.tts(text) 28 synthesizer.save_wav(wav, output_path) 29 print(f"Audio output saved to {output_path}")
File ~/Documents/source/consumeViaAudio/.venv/lib/python3.11/site-packages/TTS/utils/synthesizer.py:319, in Synthesizer.tts(self, text, speaker_name, language_name, speaker_wav, style_wav, style_text, reference_wav, reference_speaker_name, split_sentences, **kwargs) 317 speaker_id = self.tts_model.speaker_manager.name_to_id[speaker_name] 318 # handle Neon models with single speaker. --> 319 elif len(self.tts_model.speaker_manager.name_to_id) == 1: 320 speaker_id = list(self.tts_model.speaker_manager.name_to_id.values())[0] 321 elif not speaker_name and not speaker_wav:
To Reproduce
Run this code
Import necessary libraries
from TTS.utils.synthesizer import Synthesizer from IPython.display import Audio
Download and load the TTS and Vocoder models
from TTS.utils.manage import ModelManager
TTS model
manager = ModelManager() tts_model_name = "tts_models/en/ljspeech/fast_pitch" tts_model_path, tts_config_path, tts_model_item = manager.download_model(tts_model_name) print(f"model path{tts_model_path}") print(f"Model SettingsPath {tts_config_path}")
Vocoder model
vocoder_model_name = "vocoder_models/en/ljspeech/hifigan_v2" vocoder_model_path, vocoder_config_path, v_model_item = manager.download_model(vocoder_model_name) print(f"coder path{vocoder_model_path}") print(f"coder SettingsPath {vocoder_config_path}")
Define the text-to-speech function with Vocoder
def text_to_speech_with_vocoder(text, tts_model_path, tts_config_path, vocoder_model_path, vocoder_config_path, output_path='output.wav'): synthesizer = Synthesizer(tts_model_path, tts_config_path, vocoder_model_path, vocoder_config_path, None) wav = synthesizer.tts(text) synthesizer.save_wav(wav, output_path) print(f"Audio output saved to {output_path}") return output_path
Convert text to speech and play the audio
text = "Hello, this is a test. How do you think i did?" # Replace with your desired text audio_file = text_to_speech_with_vocoder(text, tts_model_path, tts_config_path, vocoder_model_path, vocoder_config_path) Audio(audio_file)
Expected behavior
I end up with a wav filef
Logs
No response
Environment
Additional context
No response