Closed 78Alpha closed 10 months ago
Use --language_idx
, not --language
. I opened a PR to return a more useful error message.
Altered it to idx with the same result. In addition, I added a line in xtts to make it "en" as a test (after the --language_idx test, and it had the same result.
(coqui) alpha78@----------:/mnt/q/Utilities/CUDA/TTS/TTS/server$ tts --text "Text for TTS" --model_path ./tts_models/en/ljspeech/ --config_path ./tts_models/en/ljspeech/config.json --out_path speech.wav --language_idx en
> Using model: xtts
> Text: Text for TTS
> Text splitted to sentences.
['Text for TTS']
Traceback (most recent call last):
File "/home/alpha78/anaconda3/envs/coqui/bin/tts", line 8, in <module>
sys.exit(main())
File "/mnt/q/Utilities/CUDA/TTS/TTS/bin/synthesize.py", line 515, in main
wav = synthesizer.tts(
File "/mnt/q/Utilities/CUDA/TTS/TTS/utils/synthesizer.py", line 374, in tts
outputs = self.tts_model.synthesize(
File "/mnt/q/Utilities/CUDA/TTS/TTS/tts/models/xtts.py", line 392, in synthesize
return self.inference_with_config(text, config, ref_audio_path=speaker_wav, language=language, **kwargs)
File "/mnt/q/Utilities/CUDA/TTS/TTS/tts/models/xtts.py", line 415, in inference_with_config
return self.full_inference(text, ref_audio_path, language, **settings)
File "/home/alpha78/anaconda3/envs/coqui/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/mnt/q/Utilities/CUDA/TTS/TTS/tts/models/xtts.py", line 476, in full_inference
(gpt_cond_latent, speaker_embedding) = self.get_conditioning_latents(
File "/home/alpha78/anaconda3/envs/coqui/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/mnt/q/Utilities/CUDA/TTS/TTS/tts/models/xtts.py", line 351, in get_conditioning_latents
audio = load_audio(file_path, load_sr)
File "/mnt/q/Utilities/CUDA/TTS/TTS/tts/models/xtts.py", line 72, in load_audio
audio, lsr = torchaudio.load(audiopath)
File "/home/alpha78/anaconda3/envs/coqui/lib/python3.10/site-packages/torchaudio/_backend/utils.py", line 204, in load
return backend.load(uri, frame_offset, num_frames, normalize, channels_first, format, buffer_size)
File "/home/alpha78/anaconda3/envs/coqui/lib/python3.10/site-packages/torchaudio/_backend/soundfile.py", line 27, in load
return soundfile_backend.load(uri, frame_offset, num_frames, normalize, channels_first, format)
File "/home/alpha78/anaconda3/envs/coqui/lib/python3.10/site-packages/torchaudio/_backend/soundfile_backend.py", line 221, in load
with soundfile.SoundFile(filepath, "r") as file_:
File "/home/alpha78/anaconda3/envs/coqui/lib/python3.10/site-packages/soundfile.py", line 658, in __init__
self._file = self._open(file, mode_int, closefd)
File "/home/alpha78/anaconda3/envs/coqui/lib/python3.10/site-packages/soundfile.py", line 1212, in _open
raise TypeError("Invalid file: {0!r}".format(self.name))
TypeError: Invalid file: None
Adding a speaker_wav also has further trouble.
(coqui) alpha78@----------:/mnt/q/Utilities/CUDA/TTS/TTS/server$ tts --text "Text for TTS" --model_path ./tts_models/en/ljspeech/ --config_path ./tts_models/en/ljspeech/config.json --out_path speech.wav --language_idx en --speaker_wav ./PYRAv2Dataset_00001.wav
> Using model: xtts
> Text: Text for TTS
> Text splitted to sentences.
['Text for TTS']
Traceback (most recent call last):
File "/home/alpha78/anaconda3/envs/coqui/bin/tts", line 8, in <module>
sys.exit(main())
File "/mnt/q/Utilities/CUDA/TTS/TTS/bin/synthesize.py", line 515, in main
wav = synthesizer.tts(
File "/mnt/q/Utilities/CUDA/TTS/TTS/utils/synthesizer.py", line 374, in tts
outputs = self.tts_model.synthesize(
File "/mnt/q/Utilities/CUDA/TTS/TTS/tts/models/xtts.py", line 392, in synthesize
return self.inference_with_config(text, config, ref_audio_path=speaker_wav, language=language, **kwargs)
File "/mnt/q/Utilities/CUDA/TTS/TTS/tts/models/xtts.py", line 415, in inference_with_config
return self.full_inference(text, ref_audio_path, language, **settings)
File "/home/alpha78/anaconda3/envs/coqui/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/mnt/q/Utilities/CUDA/TTS/TTS/tts/models/xtts.py", line 484, in full_inference
return self.inference(
File "/home/alpha78/anaconda3/envs/coqui/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/mnt/q/Utilities/CUDA/TTS/TTS/tts/models/xtts.py", line 528, in inference
text_tokens = torch.IntTensor(self.tokenizer.encode(sent, lang=language)).unsqueeze(0).to(self.device)
File "/mnt/q/Utilities/CUDA/TTS/TTS/tts/layers/xtts/tokenizer.py", line 650, in encode
return self.tokenizer.encode(txt).ids
AttributeError: 'NoneType' object has no attribute 'encode'
@78Alpha same exact error, after following the official Coqui Fine Tune Video
@eginhard already on the latest version. Please guide
Describe the bug
Inferencing custom model fails to work for various reasons (Language, unable to synthesize audio, unexpected pathing, json errors)
To Reproduce
1.) Finetune model/Train model on Ljspeech dataset 2.) Run "tts --text "Text for TTS" --model_path path/to/model --config_path path/to/config.json --out_path speech.wav --language en" 3.) Errors [Language None is not supported. | raise TypeError("Invalid file: {0!r}".format(self.name))]
Expected behavior
Produces a voice fil with which to evaluate the model
Logs
Environment
Additional context
Documentation pages had two different ways to infer the model, neither worked.