Open windowzzhhuu opened 2 years ago
what's the full command you used?
im running into same issue
python inference.py -c custom_finetuned/config.json -r custom_finetuned/model_237500 -v pretrained_models/hifigan_libritts100360_generator0p5.pt -k pretrained_models/hifigan_22khz_config.json -s 0 -t sentences.txt -o results/
I encountered the same error when I was trying to run inference on trained model from step 1.
Train the decoder python train.py -c config_ljs_decoder.json -p train_config.output_directory=outdir
when i trained model from step 2.
Train the attribute predictor: autoregressive flow (agap), bi-partite flow (bgap) or deterministic (dap) python train.py -c configljs{agap,bgap,dap}.json -p train_config.output_directory=outdir_wattr train_config.warmstart_checkpoint_path=model_path.pt
Then inference run without any problems.
step 1 only trains the decoder, after which you would need to train the attribute predictors to perform inference. step 2 only trains the attribute predictors.
if you're trying to fine-tune the pre-trained model on your data, you can warmstart from the pre-trained model and then either 1) train only the decoder and then train only the attribute predictor (this is the default from scratch recipe) 2) train the decoder and attribute predictors jointly, which requires setting unfreeze_modules to 'all', https://github.com/NVIDIA/radtts/blob/main/configs/config_ljs_decoder.json#L35
make sure to use the correct configs during inference when using the model conditioned on f0 and energy: configljs{agap,bgap,dap}.json.
step 1 only trains the decoder, after which you would need to train the attribute predictors to perform inference. step 2 only trains the attribute predictors.
if you're trying to fine-tune the pre-trained model on your data, you can warmstart from the pre-trained model and then either
- train only the decoder and then train only the attribute predictor (this is the default from scratch recipe)
- train the decoder and attribute predictors jointly, which requires setting unfreeze_modules to 'all', https://github.com/NVIDIA/radtts/blob/main/configs/config_ljs_decoder.json#L35
Hey Rafael, one quick question: for step #2 (the attribute prediction training), do I pass in radtts' pretrained model for the warmstart arg, or do I pass in the finetuned model I made in step #1 as the warmstart arg? Thanks man!
When I ran this command...
ubuntu:$
python3 \
> train.py \
> -c ./config_ljs_dap.json \
> -p train_config.output_directory=training-output \
> train_config.warmstart_checkpoint_path=radtts_pretrained_dap_model.pt
...I got this error log about Unexpected key(s) in state_dict
:
Unable to init server: Could not connect: Connection refused
Unable to init server: Could not connect: Connection refused
(train.py:286816): Gdk-CRITICAL **: 11:58:55.624: gdk_cursor_new_for_display: assertion 'GDK_IS_DISPLAY (display)' failed
train_config.output_directory=/home/ubuntu/1-radtts-repo/6-training-output
output_directory=/home/ubuntu/1-radtts-repo/6-training-output
overriding output_directory with /home/ubuntu/1-radtts-repo/6-training-output
train_config.warmstart_checkpoint_path=/home/ubuntu/1-radtts-repo/1-models/1-radtts-models/1-radtts_pretrained_dap_model.pt
warmstart_checkpoint_path=/home/ubuntu/1-radtts-repo/1-models/1-radtts-models/1-radtts_pretrained_dap_model.pt
overriding warmstart_checkpoint_path with /home/ubuntu/1-radtts-repo/1-models/1-radtts-models/1-radtts_pretrained_dap_model.pt
{'train_config': {'output_directory': '/home/ubuntu/1-radtts-repo/6-training-output', 'epochs': 1002, 'optim_algo': 'RAdam', 'learning_rate': 0.0001, 'weight_decay': 1e-06, 'sigma': 1.0, 'iters_per_checkpoint': 2500, 'batch_size': 16, 'seed': None, 'checkpoint_path': '', 'ignore_layers': [], 'ignore_layers_warmstart': [], 'finetune_layers': [], 'include_layers': [], 'vocoder_config_path': '/home/ubuntu/1-radtts-repo/2-configs/2-hifigan-configs/uberduck-vocoder-notebook-lupe-fiasco-150-2022-09-12-A.json', 'vocoder_checkpoint_path': '/home/ubuntu/1-radtts-repo/1-models/2-hifigan-models/uberduck-vocoder-notebook-lupe-fiasco-150-2022-09-12-A', 'log_attribute_samples': False, 'log_decoder_samples': True, 'warmstart_checkpoint_path': '/home/ubuntu/1-radtts-repo/1-models/1-radtts-models/1-radtts_pretrained_dap_model.pt', 'use_amp': False, 'grad_clip_val': 1.0, 'loss_weights': {'blank_logprob': -1, 'ctc_loss_weight': 0.1, 'binarization_loss_weight': 1.0, 'dur_loss_weight': 1.0, 'f0_loss_weight': 1.0, 'energy_loss_weight': 1.0, 'vpred_loss_weight': 1.0}, 'binarization_start_iter': 6000, 'kl_loss_start_iter': 18000, 'unfreeze_modules': 'all'}, 'data_config': {'training_files': {'LJS': {'basedir': '3-filelists-lupe/', 'audiodir': 'wavs', 'filelist': 'training.txt', 'lmdbpath': ''}}, 'validation_files': {'LJS': {'basedir': '3-filelists-lupe/', 'audiodir': 'wavs', 'filelist': 'validation.txt', 'lmdbpath': ''}}, 'dur_min': 0.1, 'dur_max': 10.2, 'sampling_rate': 22050, 'filter_length': 1024, 'hop_length': 256, 'win_length': 1024, 'n_mel_channels': 80, 'mel_fmin': 0.0, 'mel_fmax': 8000.0, 'f0_min': 80.0, 'f0_max': 640.0, 'max_wav_value': 32768.0, 'use_f0': True, 'use_log_f0': 0, 'use_energy_avg': True, 'use_scaled_energy': True, 'symbol_set': 'radtts', 'cleaner_names': ['radtts_cleaners'], 'heteronyms_path': 'tts_text_processing/heteronyms', 'phoneme_dict_path': 'tts_text_processing/cmudict-0.7b', 'p_phoneme': 1.0, 'handle_phoneme': 'word', 'handle_phoneme_ambiguous': 'ignore', 'include_speakers': None, 'n_frames': -1, 'betabinom_cache_path': 'data_cache/', 'lmdb_cache_path': '', 'use_attn_prior_masking': True, 'prepend_space_to_text': True, 'append_space_to_text': True, 'add_bos_eos_to_text': False, 'betabinom_scaling_factor': 1.0, 'distance_tx_unvoiced': False, 'mel_noise_scale': 0.0}, 'dist_config': {'dist_backend': 'nccl', 'dist_url': 'tcp://localhost:54321'}, 'model_config': {'n_speakers': 1, 'n_speaker_dim': 16, 'n_text': 185, 'n_text_dim': 512, 'n_flows': 8, 'n_conv_layers_per_step': 4, 'n_mel_channels': 80, 'n_hidden': 1024, 'mel_encoder_n_hidden': 512, 'dummy_speaker_embedding': False, 'n_early_size': 2, 'n_early_every': 2, 'n_group_size': 2, 'affine_model': 'wavenet', 'include_modules': 'decatnvpred', 'scaling_fn': 'tanh', 'matrix_decomposition': 'LUS', 'learn_alignments': True, 'use_speaker_emb_for_alignment': False, 'attn_straight_through_estimator': True, 'use_context_lstm': True, 'context_lstm_norm': 'spectral', 'context_lstm_w_f0_and_energy': True, 'text_encoder_lstm_norm': 'spectral', 'n_f0_dims': 1, 'n_energy_avg_dims': 1, 'use_first_order_features': False, 'unvoiced_bias_activation': 'relu', 'decoder_use_partial_padding': True, 'decoder_use_unvoiced_bias': True, 'ap_pred_log_f0': True, 'ap_use_unvoiced_bias': True, 'ap_use_voiced_embeddings': True, 'dur_model_config': None, 'f0_model_config': None, 'energy_model_config': None, 'v_model_config': {'name': 'dap', 'hparams': {'n_speaker_dim': 16, 'take_log_of_input': False, 'bottleneck_hparams': {'in_dim': 512, 'reduction_factor': 16, 'norm': 'weightnorm', 'non_linearity': 'relu'}, 'arch_hparams': {'out_dim': 1, 'n_layers': 2, 'n_channels': 256, 'kernel_size': 3, 'p_dropout': 0.5, 'lstm_type': '', 'use_linear': 1}}}}}
> got rank 0 and world size 1 ...
/home/ubuntu/1-radtts-repo/6-training-output
Using seed 1113
Applying spectral norm to text encoder LSTM
Applying spectral norm to context encoder LSTM
/home/ubuntu/1-radtts-repo/common.py:391: UserWarning: torch.qr is deprecated in favor of torch.linalg.qr and will be removed in a future PyTorch release.
The boolean parameter 'some' has been replaced with a string parameter 'mode'.
Q, R = torch.qr(A, some)
should be replaced with
Q, R = torch.linalg.qr(A, 'reduced' if some else 'complete') (Triggered internally at ../aten/src/ATen/native/BatchLinearAlgebra.cpp:2497.)
W = torch.qr(torch.FloatTensor(c, c).normal_())[0]
Initializing RAdam optimizer
Traceback (most recent call last):
File "train.py", line 498, in <module>
train(n_gpus, rank, **train_config)
File "train.py", line 353, in train
model = warmstart(warmstart_checkpoint_path, model, include_layers,
File "train.py", line 174, in warmstart
model.load_state_dict(model_dict)
File "/home/ubuntu/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1604, in load_state_dict
raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for RADTTS:
Unexpected key(s) in state_dict: "dur_pred_layer.bottleneck_layer.projection_fn.conv.bias", "dur_pred_layer.bottleneck_layer.projection_fn.conv.weight_g", "dur_pred_layer.bottleneck_layer.projection_fn.conv.weight_v", "dur_pred_layer.feat_pred_fn.convolutions.0.bias", "dur_pred_layer.feat_pred_fn.convolutions.0.weight_g", "dur_pred_layer.feat_pred_fn.convolutions.0.weight_v", "dur_pred_layer.feat_pred_fn.convolutions.1.bias", "dur_pred_layer.feat_pred_fn.convolutions.1.weight_g", "dur_pred_layer.feat_pred_fn.convolutions.1.weight_v", "dur_pred_layer.feat_pred_fn.bilstm.weight_ih_l0", "dur_pred_layer.feat_pred_fn.bilstm.bias_ih_l0", "dur_pred_layer.feat_pred_fn.bilstm.bias_hh_l0", "dur_pred_layer.feat_pred_fn.bilstm.weight_ih_l0_reverse", "dur_pred_layer.feat_pred_fn.bilstm.bias_ih_l0_reverse", "dur_pred_layer.feat_pred_fn.bilstm.bias_hh_l0_reverse", "dur_pred_layer.feat_pred_fn.bilstm.weight_hh_l0_orig", "dur_pred_layer.feat_pred_fn.bilstm.weight_hh_l0_reverse_orig", "dur_pred_layer.feat_pred_fn.bilstm.weight_hh_l0_u", "dur_pred_layer.feat_pred_fn.bilstm.weight_hh_l0_v", "dur_pred_layer.feat_pred_fn.bilstm.weight_hh_l0_reverse_u", "dur_pred_layer.feat_pred_fn.bilstm.weight_hh_l0_reverse_v", "dur_pred_layer.feat_pred_fn.dense.weight", "dur_pred_layer.feat_pred_fn.dense.bias", "f0_pred_module.bottleneck_layer.projection_fn.conv.bias", "f0_pred_module.bottleneck_layer.projection_fn.conv.weight_g", "f0_pred_module.bottleneck_layer.projection_fn.conv.weight_v", "f0_pred_module.feat_pred_fn.convolutions.0.bias", "f0_pred_module.feat_pred_fn.convolutions.0.weight_g", "f0_pred_module.feat_pred_fn.convolutions.0.weight_v", "f0_pred_module.feat_pred_fn.convolutions.1.bias", "f0_pred_module.feat_pred_fn.convolutions.1.weight_g", "f0_pred_module.feat_pred_fn.convolutions.1.weight_v", "f0_pred_module.feat_pred_fn.bilstm.weight_ih_l0", "f0_pred_module.feat_pred_fn.bilstm.bias_ih_l0", "f0_pred_module.feat_pred_fn.bilstm.bias_hh_l0", "f0_pred_module.feat_pred_fn.bilstm.weight_ih_l0_reverse", "f0_pred_module.feat_pred_fn.bilstm.bias_ih_l0_reverse", "f0_pred_module.feat_pred_fn.bilstm.bias_hh_l0_reverse", "f0_pred_module.feat_pred_fn.bilstm.weight_hh_l0_orig", "f0_pred_module.feat_pred_fn.bilstm.weight_hh_l0_reverse_orig", "f0_pred_module.feat_pred_fn.bilstm.weight_hh_l0_u", "f0_pred_module.feat_pred_fn.bilstm.weight_hh_l0_v", "f0_pred_module.feat_pred_fn.bilstm.weight_hh_l0_reverse_u", "f0_pred_module.feat_pred_fn.bilstm.weight_hh_l0_reverse_v", "f0_pred_module.feat_pred_fn.dense.weight", "f0_pred_module.feat_pred_fn.dense.bias", "energy_pred_module.bottleneck_layer.projection_fn.conv.bias", "energy_pred_module.bottleneck_layer.projection_fn.conv.weight_g", "energy_pred_module.bottleneck_layer.projection_fn.conv.weight_v", "energy_pred_module.feat_pred_fn.convolutions.0.bias", "energy_pred_module.feat_pred_fn.convolutions.0.weight_g", "energy_pred_module.feat_pred_fn.convolutions.0.weight_v", "energy_pred_module.feat_pred_fn.convolutions.1.bias", "energy_pred_module.feat_pred_fn.convolutions.1.weight_g", "energy_pred_module.feat_pred_fn.convolutions.1.weight_v", "energy_pred_module.feat_pred_fn.bilstm.weight_ih_l0", "energy_pred_module.feat_pred_fn.bilstm.bias_ih_l0", "energy_pred_module.feat_pred_fn.bilstm.bias_hh_l0", "energy_pred_module.feat_pred_fn.bilstm.weight_ih_l0_reverse", "energy_pred_module.feat_pred_fn.bilstm.bias_ih_l0_reverse", "energy_pred_module.feat_pred_fn.bilstm.bias_hh_l0_reverse", "energy_pred_module.feat_pred_fn.bilstm.weight_hh_l0_orig", "energy_pred_module.feat_pred_fn.bilstm.weight_hh_l0_reverse_orig", "energy_pred_module.feat_pred_fn.bilstm.weight_hh_l0_u", "energy_pred_module.feat_pred_fn.bilstm.weight_hh_l0_v", "energy_pred_module.feat_pred_fn.bilstm.weight_hh_l0_reverse_u", "energy_pred_module.feat_pred_fn.bilstm.weight_hh_l0_reverse_v", "energy_pred_module.feat_pred_fn.dense.weight", "energy_pred_module.feat_pred_fn.dense.bias".
But, when I re-ran this command using the model I trained in step 1 instead of the pretrained model, the command worked. So I think I answered my own question I just asked here...
do I pass in radtts' pretrained model for the warmstart arg, or do I pass in the finetuned model I made in step 1 as the warmstart arg?
...with the answer, "the finetuned model I made in step 1".
Training RADTTS (without pitch and energy conditioning):
The original command of the second step is: python train.py -c config_ljs_radtts.json -p train_config.output_directory=outdir_dir train_config.warmstart_checkpoint_path=model_path.pt model_config.include_modules="decatndur"
We should change model config.include modules = "decatndur" in the original command to model config.include modules = "decatndpm".
When Inference, the parameter "include_modules" of the configuration file should also be "decatndpm"
Training RADTTS (without pitch and energy conditioning):
1. Train the decoder python train.py -c config_ljs_radtts.json -p train_config.output_directory=outdir 2. Further train with the duration predictor python train.py -c config_ljs_radtts.json -p train_config.output_directory=outdir_dir train_config.warmstart_checkpoint_path=model_path.pt model_config.include_modules="decatndpm"
The original command of the second step is: python train.py -c config_ljs_radtts.json -p train_config.output_directory=outdir_dir train_config.warmstart_checkpoint_path=model_path.pt model_config.include_modules="decatndur"
We should change model config.include modules = "decatndur" in the original command to model config.include modules = "decatndpm".
When Inference, the parameter "include_modules" of the configuration file should also be "decatndpm"
Did you run inference without pitch and energy conditioning ? I was having a bit of trouble understanding the arguments
Training RADTTS (without pitch and energy conditioning):
1. Train the decoder python train.py -c config_ljs_radtts.json -p train_config.output_directory=outdir 2. Further train with the duration predictor python train.py -c config_ljs_radtts.json -p train_config.output_directory=outdir_dir train_config.warmstart_checkpoint_path=model_path.pt model_config.include_modules="decatndpm"
The original command of the second step is: python train.py -c config_ljs_radtts.json -p train_config.output_directory=outdir_dir train_config.warmstart_checkpoint_path=model_path.pt model_config.includemodules="decatndur" We should change model config.include modules = "decatndur" in the original command to model config.include _ modules = "decatndpm". When Inference, the parameter "include_modules" of the configuration file should also be "decatndpm"
Did you run inference without pitch and energy conditioning ? I was having a bit of trouble understanding the arguments
Yes,I run.When inference without pitch and energy conditioning, it is necessary to change the "include_modules" parameter in the configuration file from' decatn' to' decatndpm'.As shown in the following figure:
or When inference without pitch and energy conditioning, using the config.json file under the folder where the model parameters are saved as the -c parameter of the reasoning command.The file path is shown in the figure below.
Order of Inference demo: python inference.py -c outdir_dir/config.json -r RADTTS_PATH -v HG_PATH -k HG_CONFIG_PATH -t TEXT_PATH -s ljs --speaker_attributes ljs --speaker_text ljs -o results/
Sorry for my poor expressive ability, I hope the above description can help you.
Training RADTTS (without pitch and energy conditioning):
1. Train the decoder python train.py -c config_ljs_radtts.json -p train_config.output_directory=outdir 2. Further train with the duration predictor python train.py -c config_ljs_radtts.json -p train_config.output_directory=outdir_dir train_config.warmstart_checkpoint_path=model_path.pt model_config.include_modules="decatndpm"
The original command of the second step is: python train.py -c config_ljs_radtts.json -p train_config.output_directory=outdir_dir train_config.warmstart_checkpoint_path=model_path.pt model_config.includemodules="decatndur" We should change model config.include modules = "decatndur" in the original command to model config.include _ modules = "decatndpm". When Inference, the parameter "include_modules" of the configuration file should also be "decatndpm"
Did you run inference without pitch and energy conditioning ? I was having a bit of trouble understanding the arguments
Yes,I run.When inference without pitch and energy conditioning, it is necessary to change the "include_modules" parameter in the configuration file from' decatn' to' decatndpm'.As shown in the following figure:
or When inference without pitch and energy conditioning, using the config.json file under the folder where the model parameters are saved as the -c parameter of the reasoning command.The file path is shown in the figure below.
Order of Inference demo: python inference.py -c outdir_dir/config.json -r RADTTS_PATH -v HG_PATH -k HG_CONFIG_PATH -t TEXT_PATH -s ljs --speaker_attributes ljs --speaker_text ljs -o results/
Sorry for my poor expressive ability, I hope the above description can help you.
Thanks, mate; I was able to successfully run inference using the changes you mentioned. Explained everything clearly, thanks mate
I receive an error when I inferenced the text with the pretrained model