coqui-ai / TTS

🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production
http://coqui.ai
Mozilla Public License 2.0
35.26k stars 4.3k forks source link

[Bug] Cant run any of the xtts models using the TTS Command Line Interface (CLI) #3270

Closed 240db closed 11 months ago

240db commented 11 months ago

Describe the bug

Hello I just started playing with the TTS library and I am running tests using the TTS Command Line Interface (CLI). I was able to try capacitron, vits (english and portuguese) and tacotron2 successfully. But when I tried any of the xtts models, I get the same error that suggests I have yet to set a language option.

To Reproduce

I tried running the following and it issues the error

tts --text "Welcome. This is a TTS test." --model_name "tts_models/multilingual/multi-dataset/xtts_v2" --language en --out_path TTS_english_test_xtts_output2.wav

tts --text "Welcome. This is a TTS test." --model_name "tts_models/multilingual/multi-dataset/xtts_v1.1" --language en --out_path TTS_english_test_xtts_output2.wav

I tried these commands on multiple systems yet I get the same error AssertionError: ❗ Language None is not supported. Supported languages are ['en', 'es', 'fr', 'de', 'it', 'pt', 'pl', 'tr', 'ru', 'nl', 'cs', 'ar', 'zh-cn', 'hu', 'ko', 'ja']

Expected behavior

No response

Logs

No response

Environment

- TTS installed from pip install TTS
- Linux OS

Additional context

My guess is that --language en is ignored and perhaps the xtts_v2 and xtts_v1.1 models are required to run in Python? I wanted to try a multilingual model through the command line interface (CLI) are there any missing steps I am missing here?

I was able to run bark using

tts --text "Welcome. This is a TTS test." --model_name "tts_models/multilingual/multi-dataset/bark" --language en --out_path TTS_english_test_bark_output2.wav

WeberJulian commented 11 months ago
~$ tts --help
usage: tts [-h] [--list_models [LIST_MODELS]]
           [--model_info_by_idx MODEL_INFO_BY_IDX]
           [--model_info_by_name MODEL_INFO_BY_NAME] [--text TEXT]
           [--model_name MODEL_NAME] [--vocoder_name VOCODER_NAME]
           [--config_path CONFIG_PATH] [--model_path MODEL_PATH]
           [--out_path OUT_PATH] [--use_cuda USE_CUDA] [--device DEVICE]
           [--vocoder_path VOCODER_PATH]
           [--vocoder_config_path VOCODER_CONFIG_PATH]
           [--encoder_path ENCODER_PATH]
           [--encoder_config_path ENCODER_CONFIG_PATH] [--cs_model CS_MODEL]
           [--emotion EMOTION] [--language LANGUAGE] [--pipe_out [PIPE_OUT]]
           [--speed SPEED] [--speakers_file_path SPEAKERS_FILE_PATH]
           [--language_ids_file_path LANGUAGE_IDS_FILE_PATH]
           [--speaker_idx SPEAKER_IDX] [--language_idx LANGUAGE_IDX]
           [--speaker_wav SPEAKER_WAV [SPEAKER_WAV ...]]
           [--gst_style GST_STYLE]
           [--capacitron_style_wav CAPACITRON_STYLE_WAV]
           [--capacitron_style_text CAPACITRON_STYLE_TEXT]
           [--list_speaker_idxs [LIST_SPEAKER_IDXS]]
           [--list_language_idxs [LIST_LANGUAGE_IDXS]]
           [--save_spectogram SAVE_SPECTOGRAM] [--reference_wav REFERENCE_WAV]
           [--reference_speaker_idx REFERENCE_SPEAKER_IDX]
           [--progress_bar PROGRESS_BAR] [--source_wav SOURCE_WAV]
           [--target_wav TARGET_WAV] [--voice_dir VOICE_DIR]

As you can see in the help, or in the documentation, the proper argument name for specifying the language is --language_idx