coqui-ai / TTS

🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production
http://coqui.ai
Mozilla Public License 2.0
34.69k stars 4.21k forks source link

[Bug]TypeError: [!] Unknown config file type #2087

Closed marcstein closed 1 year ago

marcstein commented 1 year ago

Describe the bug

When attempting to run compute_statistics.py using the following:

python3 ./compute_statistics.py config_path /home/marc/TTS/TTS/TTS/config/config.json out_path /home/marc/Documents/TTS/Models/LJSpeech/

I get the following error:

(tts2) marc@128:~/TTS/TTS/TTS/bin$ python3 ./compute_statistics.py config_path /home/marc/TTS/TTS/TTS/config/config.json out_path /home/marc/Documents/TTS/Models/LJSpeech/ Traceback (most recent call last): File "/home/marc/TTS/TTS/TTS/bin/./compute_statistics.py", line 96, in main() File "/home/marc/TTS/TTS/TTS/bin/./compute_statistics.py", line 30, in main CONFIG = load_config(args.config_path) File "/usr/local/lib/python3.10/dist-packages/TTS-0.8.0-py3.10 config.json.txt -linux-x86_64.egg/TTS/config/init.py", line 91, in load_config raise TypeError(f" [!] Unknown config file type {ext}") TypeError: [!] Unknown config file type

I can confirm that the files referenced are accessible with no permission issues. The config that I'm using is attached (as config.json.txt). Is there something wrong with the config.json or its placement?

Many thanks!

Marc

To Reproduce

python3 ./compute_statistics.py config_path /home/marc/TTS/TTS/TTS/config/config.json out_path /home/marc/Documents/TTS/Models/LJSpeech/

Expected behavior

Traceback (most recent call last): File "/home/marc/TTS/TTS/TTS/bin/./compute_statistics.py", line 96, in main() File "/home/marc/TTS/TTS/TTS/bin/./compute_statistics.py", line 30, in main CONFIG = load_config(args.config_path) File "/usr/local/lib/python3.10/dist-packages/TTS-0.8.0-py3.10 config.json.txt -linux-x86_64.egg/TTS/config/init.py", line 91, in load_config raise TypeError(f" [!] Unknown config file type {ext}") TypeError: [!] Unknown config file type

Logs

No response

Environment

python3 /home/marc/TTS/TTS/TTS/bin/collect_env_info.py
/home/marc/.local/lib/python3.10/site-packages/torch/cuda/__init__.py:146: UserWarning:
NVIDIA GeForce RTX 3090 with CUDA capability sm_86 is not compatible with the current PyTorch installation.
The current PyTorch install supports CUDA capabilities sm_37 sm_50 sm_60 sm_70.
If you want to use the NVIDIA GeForce RTX 3090 GPU with PyTorch, please check the instructions at https://pytorch.org/get-started/locally/

  warnings.warn(incompatible_device_warn.format(device_name, capability, " ".join(arch_list), device_name))
{
    "CUDA": {
        "GPU": [
            "NVIDIA GeForce RTX 3090"
        ],
        "available": true,
        "version": "10.2"
    },
    "Packages": {
        "PyTorch_debug": false,
        "PyTorch_version": "1.12.1+cu102",
        "TTS": "0.8.0",
        "numpy": "1.22.4"
    },
    "System": {
        "OS": "Linux",
        "architecture": [
            "64bit",
            "ELF"
        ],
        "processor": "x86_64",
        "python": "3.10.6",
        "version": "#56-Ubuntu SMP Tue Sep 20 13:23:26 UTC 2022"
    }
}

Additional context

No response

erogol commented 1 year ago

Your config file sh9uld set a valid model name. It looks like in your config it is missing.

naveed81 commented 9 months ago

Hi @erogol

I am getting the same error when trying to initialize Synthesizer. Below is my code and config.json, please help.

Code: [from TTS.api import TTS

from TTS.tts.configs.vits_config import VitsConfig from TTS.tts.models.vits import Vits from TTS.config import load_config from TTS.utils.synthesizer import Synthesizer from TTS.utils.manage import ModelManager

config = VitsConfig('vits_telugu_phonemes-November-05-2023_05+17PM-99635193')

model = Vits(config)

config = load_config("vits_telugu_phonemes-November-05-2023_05+17PM-99635193/config.json")

model = Vits.init_from_config(config)

model.load_checkpoint(config, 'best_model_50232.pth', eval=True)

speakers_file_path = None language_ids_file_path = None vocoder_path = None vocoder_config_path = None encoder_path = None encoder_config_path = None cuda = True

synthesizer = Synthesizer("vits_telugu_phonemes-November-05-2023_05+17PM-99635193/best_model_50232.pth", "vits_telugu_phonemes-November-05-2023_05+17PM-99635193/config.json", speakers_file_path, language_ids_file_path, vocoder_path, vocoder_config_path, encoder_path, encoder_config_path, cuda)

speaker_idx = None language_idx = None speaker_wav = None reference_wav = None style_wav = None style_text = None reference_speaker_name = None ](url)

Error: Traceback (most recent call last): File "/home/ubuntu/TTS/./infer.py", line 26, in synthesizer = Synthesizer("vits_telugu_phonemes-November-05-2023_05+17PM-99635193/best_model_50232.pth", "vits_telugu_phonemes-November-05-2023_05+17PM-99635193/config.json", speakers_file_path, language_ids_file_path, vocoder_path, vocoder_config_path, encoder_path, encoder_config_path, cuda) File "/home/ubuntu/TTS/TTS/utils/synthesizer.py", line 101, in init self._load_vc(vc_checkpoint, vc_config, use_cuda) File "/home/ubuntu/TTS/TTS/utils/synthesizer.py", line 138, in _load_vc self.vc_config = load_config(vc_config_path) File "/home/ubuntu/TTS/TTS/config/init.py", line 97, in load_config raise TypeError(f" [!] Unknown config file type {ext}") TypeError: [!] Unknown config file type

config.json: { "output_path": "/home/ubuntu/TTS", "logger_uri": null, "run_name": "vits_telugu_phonemes", "project_name": null, "run_description": "\ud83d\udc38Coqui trainer run.", "print_step": 25, "plot_step": 100, "model_param_stats": false, "wandb_entity": null, "dashboard_logger": "tensorboard", "save_on_interrupt": true, "log_model_step": null, "save_step": 10000, "save_n_checkpoints": 5, "save_checkpoints": true, "save_all_best": false, "save_best_after": 10000, "target_loss": null, "print_eval": true, "test_delay_epochs": -1, "run_eval": true, "run_eval_steps": null, "distributed_backend": "nccl", "distributed_url": "tcp://localhost:54321", "mixed_precision": true, "precision": "fp16", "epochs": 1000, "batch_size": 24, "eval_batch_size": 24, "grad_clip": [ 1000, 1000 ], "scheduler_after_epoch": true, "lr": 0.001, "optimizer": "AdamW", "optimizer_params": { "betas": [ 0.8, 0.99 ], "eps": 1e-09, "weight_decay": 0.01 }, "lr_scheduler": null, "lr_scheduler_params": {}, "use_grad_scaler": false, "allow_tf32": false, "cudnn_enable": true, "cudnn_deterministic": false, "cudnn_benchmark": false, "training_seed": 54321, "model": "vits", "num_loader_workers": 0, "num_eval_loader_workers": 4, "use_noise_augment": false, "audio": { "fft_size": 1024, "sample_rate": 22050, "win_length": 1024, "hop_length": 256, "num_mels": 80, "mel_fmin": 0, "mel_fmax": null }, "use_phonemes": true, "phonemizer": "espeak", "phoneme_language": "te", "compute_input_seq_cache": true, "text_cleaner": "phoneme_cleaners", "enable_eos_bos_chars": false, "test_sentences_file": "", "phoneme_cache_path": "/home/ubuntu/TTS/phoneme_cache/tel", "characters": { "characters_class": "TTS.tts.utils.text.characters.IPAPhonemes", "vocab_dict": null, "pad": "", "eos": "", "bos": "", "blank": "", "characters": "iy\u0268\u0289\u026fu\u026a\u028f\u028ae\u00f8\u0258\u0259\u0275\u0264o\u025b\u0153\u025c\u025e\u028c\u0254\u00e6\u0250a\u0276\u0251\u0252\u1d7b\u0298\u0253\u01c0\u0257\u01c3\u0284\u01c2\u0260\u01c1\u029b\u02b0pbtd\u0288\u0256c\u025fk\u0261q\u0262\u0294\u0274\u014b\u0272\u0273n\u0271m\u0299r\u0280\u2c71\u027e\u027d\u0278\u03b2fv\u03b8\u00f0sz\u0283\u0292\u0282\u0290\u00e7\u029dx\u0263\u03c7\u0281\u0127\u0295h\u0266\u026c\u026e\u028b\u0279\u027bj\u0270l\u026d\u028e\u029f\u02c8\u02cc\u02d0\u02d1\u028dw\u0265\u029c\u02a2\u02a1\u0255\u0291\u027a\u0267\u02b2\u025a\u02de\u026b", "punctuations": "!'(),-.:;? ", "phonemes": null, "is_unique": false, "is_sorted": true }, "add_blank": true, "batch_group_size": 5, "loss_masking": null, "min_audio_len": 1, "max_audio_len": Infinity, "min_text_len": 1, "max_text_len": Infinity, "compute_f0": false, "compute_energy": false, "compute_linear_spec": true, "precompute_num_workers": 0, "start_by_longest": false, "shuffle": false, "drop_last": false, "datasets": [ { "formatter": "ljspeech", "dataset_name": "", "path": "/home/ubuntu/TTS/teldata", "meta_file_train": "metadata.csv", "ignored_speakers": null, "language": "", "phonemizer": "", "meta_file_val": "", "meta_file_attn_mask": "" } ], "test_sentences": [ "\u0c28\u0c2e\u0c38\u0c4d\u0c24\u0c47 \u0c16\u0c3e\u0c32\u0c3f\u0c26\u0c4d \u0c38\u0c48\u0c2b\u0c41\u0c32\u0c4d\u0c32\u0c3e \u0c17\u0c3e\u0c30\u0c41, \u0c0e\u0c32\u0c3e \u0c09\u0c28\u0c4d\u0c28\u0c3e\u0c30\u0c41?", "\u0c28\u0c2e\u0c38\u0c4d\u0c15\u0c3e\u0c30\u0c2e\u0c41 \u0c38\u0c41\u0c2c\u0c4d\u0c30\u0c39\u0c4d\u0c2e\u0c23\u0c4d\u0c2f\u0c02 \u0c17\u0c3e\u0c30\u0c41, \u0c35\u0c46\u0c02\u0c15\u0c1f\u0c47\u0c36\u0c4d\u0c35\u0c30\u0c4d\u0c32\u0c41 \u0c17\u0c3e\u0c30\u0c41", "\u0c28\u0c35\u0c40\u0c26\u0c4d \u0c05\u0c39\u0c4d\u0c2e\u0c26\u0c4d \u0c17\u0c3e\u0c30\u0c41, \u0c2e\u0c41\u0c38\u0c4d\u0c24\u0c2b\u0c3e \u0c17\u0c3e\u0c30\u0c41, \u0c2c\u0c3e\u0c17\u0c41\u0c28\u0c4d\u0c28\u0c3e\u0c30\u0c3e?", "\u0c36\u0c41\u0c2d\u0c4b\u0c26\u0c2f\u0c02 \u0c24\u0c3e\u0c33\u0c4d\u0c32\u0c42\u0c30\u0c3f \u0c36\u0c4d\u0c30\u0c40\u0c28\u0c3f\u0c35\u0c3e\u0c38\u0c4d \u0c30\u0c46\u0c21\u0c4d\u0c21\u0c3f \u0c17\u0c3e\u0c30\u0c41", "\u0c05\u0c38\u0c4d\u0c38\u0c32\u0c3e\u0c02 \u0c05\u0c32\u0c48\u0c15\u0c41\u0c2e\u0c4d \u0c38\u0c2f\u0c4d\u0c2f\u0c26\u0c4d \u0c05\u0c2b\u0c4d\u0c1c\u0c32\u0c4d \u0c39\u0c41\u0c38\u0c4d\u0c38\u0c47\u0c28\u0c4d \u0c17\u0c3e\u0c30\u0c41", "\u0c15\u0c3e\u0c02\u0c17\u0c4d\u0c30\u0c46\u0c38\u0c4d \u0c15\u0c41 \u0c13\u0c1f\u0c41 \u0c35\u0c47\u0c38\u0c3f \u0c30\u0c3e\u0c39\u0c41\u0c32\u0c4d \u0c17\u0c3e\u0c02\u0c27\u0c40 \u0c28\u0c3f \u0c2a\u0c4d\u0c30\u0c27\u0c3e\u0c28 \u0c2e\u0c02\u0c24\u0c4d\u0c30\u0c3f \u0c1a\u0c46\u0c2f\u0c4d\u0c2f\u0c02\u0c21\u0c3f" ], "eval_split_max_size": null, "eval_split_size": 0.01, "use_speaker_weighted_sampler": false, "speaker_weighted_sampler_alpha": 1.0, "use_language_weighted_sampler": false, "language_weighted_sampler_alpha": 1.0, "use_length_weighted_sampler": false, "length_weighted_sampler_alpha": 1.0, "model_args": { "num_chars": 132, "out_channels": 513, "spec_segment_size": 32, "hidden_channels": 192, "hidden_channels_ffn_text_encoder": 768, "num_heads_text_encoder": 2, "num_layers_text_encoder": 6, "kernel_size_text_encoder": 3, "dropout_p_text_encoder": 0.1, "dropout_p_duration_predictor": 0.5, "kernel_size_posterior_encoder": 5, "dilation_rate_posterior_encoder": 1, "num_layers_posterior_encoder": 16, "kernel_size_flow": 5, "dilation_rate_flow": 1, "num_layers_flow": 4, "resblock_type_decoder": "1", "resblock_kernel_sizes_decoder": [ 3, 7, 11 ], "resblock_dilation_sizes_decoder": [ [ 1, 3, 5 ], [ 1, 3, 5 ], [ 1, 3, 5 ] ], "upsample_rates_decoder": [ 8, 8, 2, 2 ], "upsample_initial_channel_decoder": 512, "upsample_kernel_sizes_decoder": [ 16, 16, 4, 4 ], "periods_multi_period_discriminator": [ 2, 3, 5, 7, 11 ], "use_sdp": true, "noise_scale": 1.0, "inference_noise_scale": 0.667, "length_scale": 1, "noise_scale_dp": 1.0, "inference_noise_scale_dp": 1.0, "max_inference_len": null, "init_discriminator": true, "use_spectral_norm_disriminator": false, "use_speaker_embedding": false, "num_speakers": 0, "speakers_file": null, "d_vector_file": null, "speaker_embedding_channels": 256, "use_d_vector_file": false, "d_vector_dim": 0, "detach_dp_input": true, "use_language_embedding": false, "embedded_language_dim": 4, "num_languages": 0, "language_ids_file": null, "use_speaker_encoder_as_loss": false, "speaker_encoder_config_path": "", "speaker_encoder_model_path": "", "condition_dp_on_speaker": true, "freeze_encoder": false, "freeze_DP": false, "freeze_PE": false, "freeze_flow_decoder": false, "freeze_waveform_decoder": false, "encoder_sample_rate": null, "interpolate_z": true, "reinit_DP": false, "reinit_text_encoder": false }, "lr_gen": 0.0002, "lr_disc": 0.0002, "lr_scheduler_gen": "ExponentialLR", "lr_scheduler_gen_params": { "gamma": 0.999875, "last_epoch": -1 }, "lr_scheduler_disc": "ExponentialLR", "lr_scheduler_disc_params": { "gamma": 0.999875, "last_epoch": -1 }, "kl_loss_alpha": 1.0, "disc_loss_alpha": 1.0, "gen_loss_alpha": 1.0, "feat_loss_alpha": 1.0, "mel_loss_alpha": 45.0, "dur_loss_alpha": 1.0, "speaker_encoder_loss_alpha": 1.0, "return_wav": true, "use_weighted_sampler": false, "weighted_sampler_attrs": {}, "weighted_sampler_multipliers": {}, "r": 1, "num_speakers": 0, "use_speaker_embedding": false, "speakers_file": null, "speaker_embedding_channels": 256, "language_ids_file": null, "use_language_embedding": false, "use_d_vector_file": false, "d_vector_file": null, "d_vector_dim": 0, "github_branch": "* dev" }

Where am I going wrong? Pls help me out.