NVIDIA / DeepLearningExamples

State-of-the-Art Deep Learning scripts organized by models - easy to train and deploy with reproducible accuracy and performance on enterprise-grade infrastructure.
13.57k stars 3.23k forks source link

Do you have pre-trained models to continue training? #319

Closed maloyan closed 3 years ago

maloyan commented 4 years ago

I'm working on Tacotron 2

I've tried to continue training from provided checkpoints JoC_Tacotron2_FP32_PyT_20190306 and JoC_WaveGlow_FP32_PyT_20190306, but it didn't work out.

:::NVLOGv0.2.2 Tacotron2_PyT 1574863255.828569651 (/workspace/tacotron2/dllogger/logger.py:279) run_start
:::NVLOGv0.2.2 Tacotron2_PyT 1574863255.837591887 (/workspace/tacotron2/dllogger/logger.py:251) cpu_info: {"num": 16, "name": "Intel(R) Xeon(R) CPU @ 2.00GHz"}
:::NVLOGv0.2.2 Tacotron2_PyT 1574863255.845489264 (/workspace/tacotron2/dllogger/logger.py:251) mem_info: {"ram": "102G"}
:::NVLOGv0.2.2 Tacotron2_PyT 1574863255.917979240 (/workspace/tacotron2/dllogger/logger.py:251) gpu_info: {"driver_version": "418.87.00", "num": 1, "name": ["Tesla P100-PCIE-16GB"], "mem": ["16280 MiB"]}
:::NVLOGv0.2.2 Tacotron2_PyT 1574863255.921807289 (/workspace/tacotron2/dllogger/logger.py:251) args: {"output_directory": "./output/", "dataset_path": "./", "model_name": "Tacotron2", "log_file": "./output/nvlog.json", "anneal_steps": ["500", "1000", "1500"], "anneal_factor": 0.1, "epochs": 1501, "epochs_per_checkpoint": 50, "checkpoint_path": "./JoC_Tacotron2_FP32_PyT_20190306", "seed": 1234, "dynamic_loss_scaling": true, "amp_run": true, "cudnn_enabled": true, "cudnn_benchmark": false, "disable_uniform_initialize_bn_weight": false, "use_saved_learning_rate": false, "learning_rate": 0.001, "weight_decay": 1e-06, "grad_clip_thresh": 1.0, "batch_size": 128, "grad_clip": 5.0, "load_mel_from_disk": false, "training_files": "filelists/ljs_audio_text_train_filelist.txt", "validation_files": "filelists/ljs_audio_text_val_filelist.txt", "text_cleaners": ["english_cleaners"], "max_wav_value": 32768.0, "sampling_rate": 22050, "filter_length": 1024, "hop_length": 256, "win_length": 1024, "mel_fmin": 0.0, "mel_fmax": 8000.0, "rank": 0, "world_size": 1, "dist_url": "tcp://localhost:23456", "group_name": "group_name", "dist_backend": "nccl", "mask_padding": false, "n_mel_channels": 80, "n_symbols": 148, "symbols_embedding_dim": 512, "encoder_kernel_size": 5, "encoder_n_convolutions": 3, "encoder_embedding_dim": 512, "n_frames_per_step": 1, "decoder_rnn_dim": 1024, "prenet_dim": 256, "max_decoder_steps": 2000, "gate_threshold": 0.5, "p_attention_dropout": 0.1, "p_decoder_dropout": 0.1, "decoder_no_early_stopping": false, "attention_rnn_dim": 1024, "attention_dim": 128, "attention_location_n_filters": 32, "attention_location_kernel_size": 31, "postnet_embedding_dim": 512, "postnet_kernel_size": 5, "postnet_n_convolutions": 5}
:::NVLOGv0.2.2 Tacotron2_PyT 1574863255.922522545 (/workspace/tacotron2/dllogger/logger.py:251) run_start
Selected optimization level O1:  Insert automatic casts around Pytorch functions and Tensor methods.

Defaults for this optimization level are:
enabled                : True
opt_level              : O1
cast_model_type        : None
patch_torch_functions  : True
keep_batchnorm_fp32    : None
master_weights         : None
loss_scale             : dynamic
Processing user overrides (additional kwargs that are not None)...
After processing overrides, optimization options are:
enabled                : True
opt_level              : O1
cast_model_type        : None
patch_torch_functions  : True
keep_batchnorm_fp32    : None
master_weights         : None
loss_scale             : dynamic
Traceback (most recent call last):
  File "train.py", line 501, in <module>
    main()
  File "train.py", line 350, in main
    args.amp_run, args.checkpoint_path)
  File "train.py", line 202, in load_checkpoint
    torch.cuda.set_rng_state_all(checkpoint['cuda_rng_state_all'])
KeyError: 'cuda_rng_state_all'

I guess that checkpoints are not made to continue training.

Do you have pre-trained models to continue training from them?

ghost commented 4 years ago

Hi @maloyan , the newest pre-trained model available is https://ngc.nvidia.com/catalog/models/nvidia:tacotron2pyt_fp16/files?version=2 At the moment we don't have any pre-trained model, compatible with the latest training scripts, that would allow to continue training.

jeffxtang commented 4 years ago

hi @GrzegorzKarchNV any update or plan on supporting using your pre-trained model to support transfer learning? Or suggestions on how to implement this if possible? As the training of Tacotron-2 and WaveGlow on the LJSpeech dataset takes so long (at least 193 hours and 347 hours respectively on a single GPU), it'd be very desirable to have transfer learning supported.

Thanks!

CookiePPP commented 4 years ago

Have you tried replacing Link

    torch.cuda.set_rng_state_all(checkpoint['cuda_rng_state_all'])
    torch.random.set_rng_state(checkpoint['random_rng_state'])
    config = checkpoint['config']
    model.load_state_dict(checkpoint['state_dict'])
    optimizer.load_state_dict(checkpoint['optimizer'])

    if amp_run:
        amp.load_state_dict(checkpoint['amp'])

with

    if 'cuda_rng_state_all' in checkpoint: torch.cuda.set_rng_state_all(checkpoint['cuda_rng_state_all'])
    if 'random_rng_state' in checkpoint: torch.random.set_rng_state(checkpoint['random_rng_state'])
    if 'config' in checkpoint: config = checkpoint['config']
    model.load_state_dict(checkpoint['state_dict'])
    if 'optimizer' in checkpoint: optimizer.load_state_dict(checkpoint['optimizer'])

    if amp_run and 'amp' in checkpoint:
        amp.load_state_dict(checkpoint['amp'])

or something along those lines? rng_state is only for reproducibility and the optimizer state will be rebuilt as you train (warmup at a lower learning rate if rebuilding momentum is important to you)

jeffxtang commented 4 years ago

Thanks @CookiePPP. I made the change you suggested and it did get rid of the cuda_rng_state_all error but now it has the following errors:

Traceback (most recent call last):
  File "train.py", line 511, in <module>
    main()
  File "train.py", line 359, in main
    args.amp_run, args.checkpoint_path)
  File "train.py", line 206, in load_checkpoint
    model.load_state_dict(checkpoint['state_dict'])
  File "/home/jeff/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 839, in load_state_dict
    self.__class__.__name__, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for Tacotron2:
    Missing key(s) in state_dict: "embedding.weight", "encoder.convolutions.0.0.conv.weight", "encoder.convolutions.0.0.conv.bias", "encoder.convolutions.0.1.weight", "encoder.convolutions.0.1.bias", "encoder.convolutions.0.1.running_mean", "encoder.convolutions.0.1.running_var", "encoder.convolutions.1.0.conv.weight", "encoder.convolutions.1.0.conv.bias", "encoder.convolutions.1.1.weight", "encoder.convolutions.1.1.bias", "encoder.convolutions.1.1.running_mean", "encoder.convolutions.1.1.running_var", "encoder.convolutions.2.0.conv.weight", "encoder.convolutions.2.0.conv.bias", "encoder.convolutions.2.1.weight", "encoder.convolutions.2.1.bias", "encoder.convolutions.2.1.running_mean", "encoder.convolutions.2.1.running_var", "encoder.lstm.weight_ih_l0", "encoder.lstm.weight_hh_l0", "encoder.lstm.bias_ih_l0", "encoder.lstm.bias_hh_l0", "encoder.lstm.weight_ih_l0_reverse", "encoder.lstm.weight_hh_l0_reverse", "encoder.lstm.bias_ih_l0_reverse", "encoder.lstm.bias_hh_l0_reverse", "decoder.prenet.layers.0.linear_layer.weight", "decoder.prenet.layers.1.linear_layer.weight", "decoder.attention_rnn.weight_ih", "decoder.attention_rnn.weight_hh", "decoder.attention_rnn.bias_ih", "decoder.attention_rnn.bias_hh", "decoder.attention_layer.query_layer.linear_layer.weight", "decoder.attention_layer.memory_layer.linear_layer.weight", "decoder.attention_layer.v.linear_layer.weight", "decoder.attention_layer.location_layer.location_conv.conv.weight", "decoder.attention_layer.location_layer.location_dense.linear_layer.weight", "decoder.decoder_rnn.weight_ih", "decoder.decoder_rnn.weight_hh", "decoder.decoder_rnn.bias_ih", "decoder.decoder_rnn.bias_hh", "decoder.linear_projection.linear_layer.weight", "decoder.linear_projection.linear_layer.bias", "decoder.gate_layer.linear_layer.weight", "decoder.gate_layer.linear_layer.bias", "postnet.convolutions.0.0.conv.weight", "postnet.convolutions.0.0.conv.bias", "postnet.convolutions.0.1.weight", "postnet.convolutions.0.1.bias", "postnet.convolutions.0.1.running_mean", "postnet.convolutions.0.1.running_var", "postnet.convolutions.1.0.conv.weight", "postnet.convolutions.1.0.conv.bias", "postnet.convolutions.1.1.weight", "postnet.convolutions.1.1.bias", "postnet.convolutions.1.1.running_mean", "postnet.convolutions.1.1.running_var", "postnet.convolutions.2.0.conv.weight", "postnet.convolutions.2.0.conv.bias", "postnet.convolutions.2.1.weight", "postnet.convolutions.2.1.bias", "postnet.convolutions.2.1.running_mean", "postnet.convolutions.2.1.running_var", "postnet.convolutions.3.0.conv.weight", "postnet.convolutions.3.0.conv.bias", "postnet.convolutions.3.1.weight", "postnet.convolutions.3.1.bias", "postnet.convolutions.3.1.running_mean", "postnet.convolutions.3.1.running_var", "postnet.convolutions.4.0.conv.weight", "postnet.convolutions.4.0.conv.bias", "postnet.convolutions.4.1.weight", "postnet.convolutions.4.1.bias", "postnet.convolutions.4.1.running_mean", "postnet.convolutions.4.1.running_var". 
    Unexpected key(s) in state_dict: "module.embedding.weight", "module.encoder.convolutions.0.0.conv.weight", "module.encoder.convolutions.0.0.conv.bias", "module.encoder.convolutions.0.1.weight", "module.encoder.convolutions.0.1.bias", "module.encoder.convolutions.0.1.running_mean", "module.encoder.convolutions.0.1.running_var", "module.encoder.convolutions.0.1.num_batches_tracked", "module.encoder.convolutions.1.0.conv.weight", "module.encoder.convolutions.1.0.conv.bias", "module.encoder.convolutions.1.1.weight", "module.encoder.convolutions.1.1.bias", "module.encoder.convolutions.1.1.running_mean", "module.encoder.convolutions.1.1.running_var", "module.encoder.convolutions.1.1.num_batches_tracked", "module.encoder.convolutions.2.0.conv.weight", "module.encoder.convolutions.2.0.conv.bias", "module.encoder.convolutions.2.1.weight", "module.encoder.convolutions.2.1.bias", "module.encoder.convolutions.2.1.running_mean", "module.encoder.convolutions.2.1.running_var", "module.encoder.convolutions.2.1.num_batches_tracked", "module.encoder.lstm.weight_ih_l0", "module.encoder.lstm.weight_hh_l0", "module.encoder.lstm.bias_ih_l0", "module.encoder.lstm.bias_hh_l0", "module.encoder.lstm.weight_ih_l0_reverse", "module.encoder.lstm.weight_hh_l0_reverse", "module.encoder.lstm.bias_ih_l0_reverse", "module.encoder.lstm.bias_hh_l0_reverse", "module.decoder.prenet.layers.0.linear_layer.weight", "module.decoder.prenet.layers.1.linear_layer.weight", "module.decoder.attention_rnn.weight_ih", "module.decoder.attention_rnn.weight_hh", "module.decoder.attention_rnn.bias_ih", "module.decoder.attention_rnn.bias_hh", "module.decoder.attention_layer.query_layer.linear_layer.weight", "module.decoder.attention_layer.memory_layer.linear_layer.weight", "module.decoder.attention_layer.v.linear_layer.weight", "module.decoder.attention_layer.location_layer.location_conv.conv.weight", "module.decoder.attention_layer.location_layer.location_dense.linear_layer.weight", "module.decoder.decoder_rnn.weight_ih", "module.decoder.decoder_rnn.weight_hh", "module.decoder.decoder_rnn.bias_ih", "module.decoder.decoder_rnn.bias_hh", "module.decoder.linear_projection.linear_layer.weight", "module.decoder.linear_projection.linear_layer.bias", "module.decoder.gate_layer.linear_layer.weight", "module.decoder.gate_layer.linear_layer.bias", "module.postnet.convolutions.0.0.conv.weight", "module.postnet.convolutions.0.0.conv.bias", "module.postnet.convolutions.0.1.weight", "module.postnet.convolutions.0.1.bias", "module.postnet.convolutions.0.1.running_mean", "module.postnet.convolutions.0.1.running_var", "module.postnet.convolutions.0.1.num_batches_tracked", "module.postnet.convolutions.1.0.conv.weight", "module.postnet.convolutions.1.0.conv.bias", "module.postnet.convolutions.1.1.weight", "module.postnet.convolutions.1.1.bias", "module.postnet.convolutions.1.1.running_mean", "module.postnet.convolutions.1.1.running_var", "module.postnet.convolutions.1.1.num_batches_tracked", "module.postnet.convolutions.2.0.conv.weight", "module.postnet.convolutions.2.0.conv.bias", "module.postnet.convolutions.2.1.weight", "module.postnet.convolutions.2.1.bias", "module.postnet.convolutions.2.1.running_mean", "module.postnet.convolutions.2.1.running_var", "module.postnet.convolutions.2.1.num_batches_tracked", "module.postnet.convolutions.3.0.conv.weight", "module.postnet.convolutions.3.0.conv.bias", "module.postnet.convolutions.3.1.weight", "module.postnet.convolutions.3.1.bias", "module.postnet.convolutions.3.1.running_mean", "module.postnet.convolutions.3.1.running_var", "module.postnet.convolutions.3.1.num_batches_tracked", "module.postnet.convolutions.4.0.conv.weight", "module.postnet.convolutions.4.0.conv.bias", "module.postnet.convolutions.4.1.weight", "module.postnet.convolutions.4.1.bias", "module.postnet.convolutions.4.1.running_mean", "module.postnet.convolutions.4.1.running_var", "module.postnet.convolutions.4.1.num_batches_tracked". 

If I used my own checkpoint trained on the LJSpeech dataset as the value for --checkpoint-path to resume training on my own dataset, it seems to work fine. I wonder what's the difference between the officially downloaded checkpoint JoC_Tacotron2_FP16_PyT_20190306 with one generated from the train script..

CookiePPP commented 4 years ago

@jeffxtang

Link


https://discuss.pytorch.org/t/solved-keyerror-unexpected-key-module-encoder-embedding-weight-in-state-dict/1686/2 Looks like it's related to the way they saved it originally. it's a bandaid fix but

    if 'cuda_rng_state_all' in checkpoint: torch.cuda.set_rng_state_all(checkpoint['cuda_rng_state_all'])
    if 'random_rng_state' in checkpoint: torch.random.set_rng_state(checkpoint['random_rng_state'])
    if 'config' in checkpoint: config = checkpoint['config']
    model.load_state_dict({k.replace('module.',''): v for k, v in checkpoint['state_dict'].items()})
    if 'optimizer' in checkpoint: optimizer.load_state_dict(checkpoint['optimizer'])

    if amp_run and 'amp' in checkpoint:
        amp.load_state_dict(checkpoint['amp'])

might work (it literally just replaces all cases of 'module.' with ''). EDIT: num_batches_tracked doesn't exist in the current code so ~you'll~ you might need to ignore those.

    if 'cuda_rng_state_all' in checkpoint: torch.cuda.set_rng_state_all(checkpoint['cuda_rng_state_all'])
    if 'random_rng_state' in checkpoint: torch.random.set_rng_state(checkpoint['random_rng_state'])
    if 'config' in checkpoint: config = checkpoint['config']
    model_dict = model.state_dict()
    missing_items = {k.replace('module.',''): v for k, v in checkpoint['state_dict'].items() if not k in model_dict}
    if missing_items: print(missing_items.keys(),'does not exist in the current model and is being ignored')
    model.load_state_dict({k.replace('module.',''): v for k, v in checkpoint['state_dict'].items() if k in model_dict})
    if 'optimizer' in checkpoint: optimizer.load_state_dict(checkpoint['optimizer'])

    if amp_run and 'amp' in checkpoint:
        amp.load_state_dict(checkpoint['amp'])
jeffxtang commented 4 years ago

@CookiePPP awesome it works! Thanks so much! I haven't tried your edited change dealing with missing_items, but just changed the line of load_state_dict and the Tacotron-2 training runs with JoC_Tacotron2_FP16_PyT_20190306. Btw, have you done this kind of transfer learning on your own dataset? Should I expect good results after this training if my own dataset is about 2.5 hours of audios?

CookiePPP commented 4 years ago

@jeffxtang I've been trying (and mostly failing) with 48Khz models so I don't have anything super recently transfer-learned.

This is an example of 22Khz from months ago without any learning-rate decay This was trained on the character "Twilight Sparkle" from a childrens show who has 2.5 hours of audio. Since this characters clips are very expressive and the clips were taken from iTunes and Youtube, this probably represents the worst that your model should perform.

jeffxtang commented 4 years ago

Thanks @CookiePPP for sharing. That example sounds pretty good. How many epochs did you train for and how long did it take to run on how many GPUs to get such result? Did you do transfer learning on both Tacotron2 and WaveGlow?

In addition to trying transfer learning on my own dataset, I'm also trying to do training on the Nancy dataset, which seems to sound better than LJSpeech, first and then try the transfer learning. Will be happy to share results after I'm gone.

And what do you mean by "most failing"? Have you tried any non-48khz models like 16k?

CookiePPP commented 4 years ago

How many epochs did you train for and how long did it take to run on how many GPUs to get such result?

21333 Iterations with batch_size 16 on a single Google Colab P100. I'd also note, if everything is working it should take less than 1000 iterations to produce a result than can be identified as the target speaker (though quality after only 1000 iterations won't be great).

Did you do transfer learning on both Tacotron2 and WaveGlow?

Only Tacotron2.

And what do you mean by "most failing"?

I haven't been able to surpass the quality of the pretrained WaveGlow model at higher sampling rates. It's possible that I need to be more patient or maybe VCTK is a bad dataset for training WaveGlow since it contains many speakers. I've asked the creator of 15.ai but he says his model was trained using default parameters from the WaveGlow paper (on his private implementation) so I'm unsure what the issue is.

jeffxtang commented 4 years ago

Thanks @CookiePPP for the answers! Why did you use VCTK to train WaveGlow? Was the pretrained WaveGlow checkpoint trained by the LJSpeech dataset?

Did you get the WaveGlow pretrained checkpoint from https://developer.nvidia.com/joc-waveglow-fp32-pyt-20190306 or https://developer.nvidia.com/joc-waveglow-fp16-pyt-20190306?

So basically, we can use the finetuned Tacotron2 model and the pretrained WaveGlow model to generate customized own voice, right?

CookiePPP commented 4 years ago

You should only need to train Tacotron2 for less than 6 hours to replicate the clip I posted.

This is an example of 22Khz from months ago without any learning-rate decay

The pretrained 22khz WaveGlow is very good. You could fine-tune WaveGlow on your own dataset to further improve quality.

Was the pretrained WaveGlow checkpoint trained by the LJSpeech dataset?

I used the model linked in the Mellotron repo for 22Khz

Why did you use VCTK to train WaveGlow?

VCTK is only related to my attempt at creating a 48khz WaveGlow. Samples 1 2 3 PS: I used VCTK because I don't know any other high sampling rate datasets.

jeffxtang commented 4 years ago

Cool. Thanks for all the info! If you're interested, the Nancy dataset has high sampling rate:

wavn.tgz - Tarred archive of the prompts in the Nancy corpus, at 16KHz sampling rate, with each file name as used in the Nancy corpus.
96k_wav_part_[1-4].tar.bz2 - the original studio recordings at 96KHz, with each prompt separated by a pure tone, and named sequentially
44k_wav.tar.bz2 - the original studio recordings downsampled to 44.1KHz, with each prompt separated by a pure tone, and named sequentially.
CookiePPP commented 4 years ago

@jeffxtang Wow, Thanks!

jeffxtang commented 4 years ago

Btw to be able to download the dataset you need to click the license link there to fill in a form and then wait for a few days before receiving a password in email to download.

CookiePPP commented 4 years ago

@jeffxtang Yep, already submitted. EDIT: Yay! Got approved.

jeffxtang commented 4 years ago

hi @CookiePPP i used the Nancy dataset 16khz and trained a Tacotron-2 model (to hopefully use it as the pretrained model instead of the official LJSpeech-trained model as it's claimed that Nancy sounds more natural than LJSpeech), but when I do transfer learning on my own dataset using the trained Nancy checkpoint, I got a new error (on the code if 'cuda_rng_state_all' in checkpoint: torch.cuda.set_rng_state_all(checkpoint['cuda_rng_state_all']))

Traceback (most recent call last):
  File "train.py", line 513, in <module>
    main()
  File "train.py", line 361, in main
    args.amp_run, args.checkpoint_path)
  File "train.py", line 203, in load_checkpoint
    if 'cuda_rng_state_all' in checkpoint: torch.cuda.set_rng_state_all(checkpoint['cuda_rng_state_all'])
  File "/home/jeff/.local/lib/python3.6/site-packages/torch/cuda/random.py", line 71, in set_rng_state_all
    set_rng_state(state, i)
  File "/home/jeff/.local/lib/python3.6/site-packages/torch/cuda/random.py", line 62, in set_rng_state
    _lazy_call(cb)
  File "/home/jeff/.local/lib/python3.6/site-packages/torch/cuda/__init__.py", line 142, in _lazy_call
    callable()
  File "/home/jeff/.local/lib/python3.6/site-packages/torch/cuda/random.py", line 59, in cb
    default_generator = torch.cuda.default_generators[idx]
IndexError: tuple index out of range

This works when I use JoC_Tacotron2_FP16_PyT_20190306 for transfer learning. Any suggestions on how to fix this? Thanks!

ghost commented 4 years ago

Hi All, we have published new checkpoints that are trained for 6000 epochs (Tacotron2) and 14000 epochs (WaveGlow, 256 residual channels); both checkpoints allow resuming. https://ngc.nvidia.com/catalog/models/nvidia:tacotron2pyt_fp16/files?version=3 https://ngc.nvidia.com/catalog/models/nvidia:waveglow256pyt_fp16/files?version=2 (256 residual channels) Let me know if they work with your setups!

CookiePPP commented 4 years ago

@GrzegorzKarchNV

14000 epochs

With that much spare compute, have you experimented with higher sampling rate WaveGlow models? I've got multiple 48Khz models at this point but I can't get the artifacting down below those 22Khz trained by you guys. I assume I need more than double the training time for 48Khz and to increase the WN layers, which I'm testing now (but it's going to take a while to see if it's working).

ghost commented 4 years ago

@CookiePPP yes, you would need to train the model for longer time. Increasing the number of WN layers or segment length might also help but it will require some experimenting. Have you used a denoiser for the inference? It's available in the latest repo: https://github.com/NVIDIA/DeepLearningExamples/blob/master/PyTorch/SpeechSynthesis/Tacotron2/inference.py#L231 It should remove some of the artifacts.

CookiePPP commented 4 years ago

@GrzegorzKarchNV https://drive.google.com/open?id=1-LJfMfUL8-7BGmNlqHpw-vPDXWp6_Ous It's hard to describe. That's some samples from a 256 Channel model.

I'm worried that VCTK, Blizzard2011 (Nancy) and MLP datasets don't mix well. They're trimmed and for each folder I adjust the volume of all the audio files by the same amount, to try and keep the variety but also get the datasets to work together.

Adjusting segment length and datasets effect the Log likelihood by a large amount without having a clear effect on quality. I recently added a testing phase to my code that will refresh the inverse_weights and then get MSE (mean squared error) and MAE (mean absolute error) of the spectrogram for multiple windows lengths (600, 1200 and 2400). and I hope that I can use the spectrogram error as a better metric for comparing models.

Also, Do you know what the effect would be of changing the n_group parameter? It's the main parameter I don't understand right now.

jeffxtang commented 4 years ago

Thanks @GrzegorzKarchNV for sharing. I'm downloading them and try transfer learning on them - should the new checkpoints perform noticeably better than the old ones?

As @CookiePPP is concerned some datasets may not mix well, I recently found that 30-minute of Nancy data finetuned on JoC_Tacotron2_FP16_PyT_20190306 works pretty well, but 30-minute or 50-minute of the CMU male speaker data leads to TTS results worse (with noticeable noise) than Nancy's results. I tried a TTS dataset analysis tool on the Nancy and CMU datasets but can't see the cause of the result difference.

Any tips on what datasets may or may not work well when finetuned on the pretrained checkpoints?

CookiePPP commented 4 years ago

@jeffxtang I'm referring to WaveGlow training specifically. My Tacotron2 model already performs well and has been finished for months.

RPrenger commented 4 years ago

@CookiePPP The n_group parameter is the number of samples that will be "squeezed" into a vector. So you can kind of look at it as the window size. A group size of 2 would correspond to using the even samples to affine transform the odd samples and vice versa on every other layer. The default of 8 corresponds to using samples (0,1,2,3), (8,9,10,11), (16,17,18,19) etc. to affine transform samples (4,5,6,7), (12,13,14,15), (20,21,22,23) etc.

We experimented with many different values, but relatively early in the training time to save time. IIRC it seemed like a lower group number preserved more high frequencies but had more squeaking artifacts early in training, where as very large values sounded more muffled. With enough training time I don't think small values will sound all that different, but we haven't tried training all of them for a million iterations.

Also, intuitively it seems like you'd need to increase the n_channels parameter with a larger n_group as you're essentially doing convolutions over a shorter time window, but with larger vectors at each time.

CookiePPP commented 4 years ago

@RPrenger Hmm, would you suggest increasing n_group if training WaveGlow at higher sampling rates? I suppose like you say, it should sound similar after enough training time.

edit: For anyone interested in the n_group parameter in future. Have a read of the SqueezeWave and WaveFlow papers for a jump start.

sadmoody commented 4 years ago

@CookiePPP

My Tacotron2 model already performs well and has been finished for months.

Your sample sounded great! What was your learning rate for transfer learning? Did you ignore any layers?

apthagowda97 commented 4 years ago

I am getting an error while doing inference with the new Waveglow checkpoint from this link.

Hi All, we have published new checkpoints that are trained for 6000 epochs (Tacotron2) and 14000 epochs (WaveGlow, 256 residual channels); both checkpoints allow resuming. https://ngc.nvidia.com/catalog/models/nvidia:tacotron2pyt_fp16/files?version=3 https://ngc.nvidia.com/catalog/models/nvidia:waveglow256pyt_fp16/files?version=2 (256 residual channels) Let me know if they work with your setups!

Error:

RuntimeError: Error(s) in loading state_dict for WaveGlow__forward_is_infer:
    size mismatch for WN.0.in_layers.0.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]).
    size mismatch for WN.0.in_layers.0.weight_g: copying a param with shape torch.Size([512, 1, 1]) from checkpoint, the shape in current model is torch.Size([1024, 1, 1]).
    size mismatch for WN.0.in_layers.0.weight_v: copying a param with shape torch.Size([512, 256, 3]) from checkpoint, the shape in current model is torch.Size([1024, 512, 3]).
    size mismatch for WN.0.in_layers.1.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]).
............
cobolserverpages commented 4 years ago

I am getting an error while doing inference with the new Waveglow checkpoint from this link.

Hi All, we have published new checkpoints that are trained for 6000 epochs (Tacotron2) and 14000 epochs (WaveGlow, 256 residual channels); both checkpoints allow resuming. https://ngc.nvidia.com/catalog/models/nvidia:tacotron2pyt_fp16/files?version=3 https://ngc.nvidia.com/catalog/models/nvidia:waveglow256pyt_fp16/files?version=2 (256 residual channels) Let me know if they work with your setups!

Error:

RuntimeError: Error(s) in loading state_dict for WaveGlow__forward_is_infer:
  size mismatch for WN.0.in_layers.0.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]).
  size mismatch for WN.0.in_layers.0.weight_g: copying a param with shape torch.Size([512, 1, 1]) from checkpoint, the shape in current model is torch.Size([1024, 1, 1]).
  size mismatch for WN.0.in_layers.0.weight_v: copying a param with shape torch.Size([512, 256, 3]) from checkpoint, the shape in current model is torch.Size([1024, 512, 3]).
  size mismatch for WN.0.in_layers.1.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]).
............

@apthagowda97, changing your --wn-channels might help you. This inference worked for me for those checkpoints:

% python inference.py --tacotron2 output/tacotron2_1032590_6000_amp --waveglow output/waveglow_1076430_14000_amp --wn-channels 256 -o output/ --include-warmup -i phrases/phrase.txt --fp16

apthagowda97 commented 4 years ago

Just an update for those who wanted to continue training from pre-trained model weights.

Hi All, we have published new checkpoints that are trained for 6000 epochs (Tacotron2) and 14000 epochs (WaveGlow, 256 residual channels); both checkpoints allow resuming. https://ngc.nvidia.com/catalog/models/nvidia:tacotron2pyt_fp16/files?version=3 https://ngc.nvidia.com/catalog/models/nvidia:waveglow256pyt_fp16/files?version=2 (256 residual channels) Let me know if they work with your setups!

@GrzegorzKarchNV we still required to use the code changes suggested by @CookiePPP.

Change this code block in train.py (updated code):

    torch.cuda.set_rng_state(checkpoint['cuda_rng_state_all'][device_id])
    torch.random.set_rng_state(checkpoint['random_rng_states_all'][device_id])
    config = checkpoint['config']
    model.load_state_dict(checkpoint['state_dict'])
    optimizer.load_state_dict(checkpoint['optimizer'])

    if amp_run:
        amp.load_state_dict(checkpoint['amp'])

with

    if 'cuda_rng_state_all' in checkpoint: torch.cuda.set_rng_state(checkpoint['cuda_rng_state_all'][device_id])
    if 'random_rng_state_all' in checkpoint: torch.random.set_rng_state(checkpoint['random_rng_state_all'][device_id])
    if 'config' in checkpoint: config = checkpoint['config']
    model.load_state_dict({k.replace('module.',''): v for k, v in checkpoint['state_dict'].items()})
    if 'optimizer' in checkpoint: optimizer.load_state_dict(checkpoint['optimizer'])

    if amp_run and 'amp' in checkpoint:
        amp.load_state_dict(checkpoint['amp'])
MuruganR96 commented 4 years ago

@CookiePPP. @apthagowda97 , i tried, but that is not working.

python -m multiproc train.py -m WaveGlow -o ./output/ -lr 1e-4 --epochs 1501 -bs 4 --segment-length 8000 --weight-decay 0 --grad-clip-thresh 3.4028234663852886e+38 --cudnn-enabled --cudnn-benchmark --log-file nvlog.json --training-files filelists/hindi_audio_text_train_filelist.txt --validation-files filelists/hindi_audio_text_val_filelist.txt --amp --checkpoint-path waveglow_1076430_14000_amp

Traceback (most recent call last):
  File "train.py", line 554, in <module>
    main()
  File "train.py", line 410, in main
    args.amp, args.checkpoint_path, local_rank)
  File "train.py", line 255, in load_checkpoint
    model.load_state_dict({k.replace('module.',''): v for k, v in checkpoint['state_dict'].items()})
  File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 879, in load_state_dict
    self.__class__.__name__, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for WaveGlow:
    size mismatch for WN.0.in_layers.0.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]).
apthagowda97 commented 4 years ago

@MuruganR96 Try the changes suggested by @cobolserverpages in the above comment.

ghost commented 3 years ago

@apthagowda97 @CookiePPP please check the updated Tacotron2 model - it has the suggested code changes for random_rng_state - thanks! https://github.com/NVIDIA/DeepLearningExamples/pull/759 For the checkpoint to work, only this change is needed. Let me know if you have any issues with it.

@MuruganR96 you will need to change --epoch to a number >14000 since the checkpoint was trained for that many epochs. Also it was trained with --segment-length 16000 (though setting it to other value shouldn't break the training).

MuruganR96 commented 3 years ago

@GrzegorzKarchNV sir, Please help me. i was facing another issue. and please check below command, if i am wrong, correct me.

python train.py -m WaveGlow -o ./ -lr 1e-4 --epochs 14500 -bs 10 --segment-length 16000 --weight-decay 0 --grad-clip-thresh 65504.0 --cudnn-enabled --cudnn-benchmark --log-file nvlog.json --training-files filelists/hindi_audio_text_train_filelist.txt --amp --validation-files filelists/hindi_audio_text_val_filelist.txt --wn-channels 256 --checkpoint-path backup/waveglow_1076430_14000_amp

Traceback (most recent call last):
  File "train.py", line 559, in <module>
    main()
  File "train.py", line 415, in main
    args.amp, args.checkpoint_path, local_rank)
  File "train.py", line 260, in load_checkpoint
    model.load_state_dict(checkpoint['state_dict'])
  File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 879, in load_state_dict
    self.__class__.__name__, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for WaveGlow:
    Missing key(s) in state_dict: "upsample.weight", "upsample.bias", "WN.0.in_layers.0.bias", "WN.0.in_layers.0.weight_g", "WN.0.in_layers.0.weight_v", "WN.0.in_layers.1.bias", "WN.0.in_layers.1.weight_g", "WN.0.in_layers.1.weight_v", "WN.0.in_layers.2.bias", "WN.0.in_layers.2.weight_g", "WN.0.in_layers.2.weight_v", "WN.0.in_layers.3.bias", "WN.0.in_layers.3.weight_g", "WN.0.in_layers.3.weight_v", "WN.0.in_layers.4.bias", "WN.0.in_layers.4.weight_g", "WN.0.in_layers.4.weight_v", "WN.0.in_layers.5.bias", "WN.0.in_layers.5.weight_g", "WN.0.in_layers.5.weight_v", "WN.0.in_layers.6.bias", "WN.0.in_layers.6.weight_g", "WN.0.in_layers.6.weight_v", "WN.0.in_layers.7.bias", "WN.0.in_layers.7.weight_g", "WN.0.in_layers.7.weight_v", "WN.0.res_skip_layers.0.bias", "WN.0.res_skip_layers.0.weight_g", "WN.0.res_skip_layers.0.weight_v", "WN.0.res_skip_layers.1.bias", "WN.0.res_skip_layers.1.weight_g", "WN.0.res_skip_layers.1.weight_v", "WN.0.res_skip_layers.2.bias", "WN.0.res_skip_layers.2.weight_g", "WN.0.res_skip_layers.2.weight_v", "WN.0.res_skip_layers.3.bias", "WN.0.res_skip_layers.3.weight_g", "WN.0.res_skip_layers.3.weight_v", "WN.0.res_skip_layers.4.bias", "WN.0.res_skip_layers.4.weight_g", "WN.0.res_skip_layers.4.weight_v", "WN.0.res_skip_layers.5.bias", "WN.0.res_skip_layers.5.weight_g", "WN.0.res_skip_layers.5.weight_v", "WN.0.res_skip_layers.6.bias", "WN.0.res_skip_layers.6.weight_g", "WN.0.res_skip_layers.6.weight_v", "WN.0.res_skip_layers.7.bias", "WN.0.res_skip_layers.7.weight_g", "WN.0.res_skip_layers.7.weight_v", "WN.0.cond_layers.0.bias", "WN.0.cond_layers.0.weight_g", "WN.0.cond_layers.0.weight_v", "WN.0.cond_layers.1.bias", "WN.0.cond_layers.1.weight_g", "WN.0.cond_layers.1.weight_v", "WN.0.cond_layers.2.bias", "WN.0.cond_layers.2.weight_g", "WN.0.cond_layers.2.weight_v", "WN.0.cond_layers.3.bias", "WN.0.cond_layers.3.weight_g", "WN.0.cond_layers.3.weight_v", "WN.0.cond_layers.4.bias", "WN.0.cond_layers.4.weight_g", "WN.0.cond_layers.4.weight_v", "WN.0.cond_layers.5.bias", "WN.0.cond_layers.5.weight_g", "WN.0.cond_layers.5.weight_v", "WN.0.cond_layers.6.bias", "WN.0.cond_layers.6.weight_g", "WN.0.cond_layers.6.weight_v", "WN.0.cond_layers.7.bias", "WN.0.cond_layers.7.weight_g", "WN.0.cond_layers.7.weight_v", "WN.0.start.bias", "WN.0.start.weight_g", "WN.0.start.weight_v", "WN.0.end.weight", "WN.0.end.bias", "WN.1.in_layers.0.bias", "WN.1.in_layers.0.weight_g", "WN.1.in_layers.0.weight_v", "WN.1.in_layers.1.bias", "WN.1.in_layers.1.weight_g", "WN.1.in_layers.1.weight_v", "WN.1.in_layers.2.bias", "WN.1.in_layers.2.weight_g", "WN.1.in_layers.2.weight_v", "WN.1.in_layers.3.bias", "WN.1.in_layers.3.weight_g", "WN.1.in_layers.3.weight_v", "WN.1.in_layers.4.bias", "WN.1.in_layers.4.weight_g", "WN.1.in_layers.4.weight_v", "WN.1.in_layers.5.bias", "WN.1.in_layers.5.weight_g", "WN.1.in_layers.5.weight_v", "WN.1.in_layers.6.bias", "WN.1.in_layers.6.weight_g", "WN.1.in_layers.6.weight_v", "WN.1.in_layers.7.bias", "WN.1.in_layers.7.weight_g", "WN.1.in_layers.7.weight_v", "WN.1.res_skip_layers.0.bias", "WN.1.res_skip_layers.0.weight_g", "WN.1.res_skip_layers.0.weight_v", "WN.1.res_skip_layers.1.bias", "WN.1.res_skip_layers.1.weight_g", "WN.1.res_skip_layers.1.weight_v", "WN.1.res_skip_layers.2.bias", "WN.1.res_skip_layers.2.weight_g", "WN.1.res_skip_layers.2.weight_v", "WN.1.res_skip_layers.3.bias", "WN.1.res_skip_layers.3.weight_g", "WN.1.res_skip_layers.3.weight_v", "WN.1.res_skip_layers.4.bias", "WN.1.res_skip_layers.4.weight_g", "WN.1.res_skip_layers.4.weight_v", "WN.1.res_skip_layers.5.bias", "WN.1.res_skip_layers.5.weight_g", "WN.1.res_skip_layers.5.weight_v", "WN.1.res_skip_layers.6.bias", "WN.1.res_skip_layers.6.weight_g", "WN.1.res_skip_layers.6.weight_v", "WN.1.res_skip_layers.7.bias", "WN.1.res_skip_layers.7.weight_g", "WN.1.res_skip_layers.7.weight_v", "WN.1.cond_layers.0.bias", "WN.1.cond_layers.0.weight_g", "WN.1.cond_layers.0.weight_v", "WN.1.cond_layers.1.bias", "WN.1.cond_layers.1.weight_g", "WN.1.cond_layers.1.weight_v", "WN.1.cond_layers.2.bias", "WN.1.cond_layers.2.weight_g", "WN.1.cond_layers.2.weight_v", "WN.1.cond_layers.3.bias", "WN.1.cond_layers.3.weight_g", "WN.1.cond_layers.3.weight_v", "WN.1.cond_layers.4.bias", "WN.1.cond_layers.4.weight_g", "WN.1.cond_layers.4.weight_v", "WN.1.cond_layers.5.bias", "WN.1.cond_layers.5.weight_g", "WN.1.cond_layers.5.weight_v", "WN.1.cond_layers.6.bias", "WN.1.cond_layers.6.weight_g", "WN.1.cond_layers.6.weight_v", "WN.1.cond_layers.7.bias", "WN.1.cond_layers.7.weight_g", "WN.1.cond_layers.7.weight_v", "WN.1.start.bias", "WN.1.start.weight_g", "WN.1.start.weight_v", "WN.1.end.weight", "WN.1.end.bias", "WN.2.in_layers.0.bias", "WN.2.in_layers.0.weight_g", "WN.2.in_layers.0.weight_v", "WN.2.in_layers.1.bias", "WN.2.in_layers.1.weight_g", "WN.2.in_layers.1.weight_v", "WN.2.in_layers.2.bias", "WN.2.in_layers.2.weight_g", "WN.2.in_layers.2.weight_v", "WN.2.in_layers.3.bias", "WN.2.in_layers.3.weight_g", "WN.2.in_layers.3.weight_v", "WN.2.in_layers.4.bias", "WN.2.in_layers.4.weight_g", "WN.2.in_layers.4.weight_v", "WN.2.in_layers.5.bias", "WN.2.in_layers.5.weight_g", "WN.2.in_layers.5.weight_v", "WN.2.in_layers.6.bias", "WN.2.in_layers.6.weight_g", "WN.2.in_layers.6.weight_v", "WN.2.in_layers.7.bias", "WN.2.in_layers.7.weight_g", "WN.2.in_layers.7.weight_v", "WN.2.res_skip_layers.0.bias", "WN.2.res_skip_layers.0.weight_g", "WN.2.res_skip_layers.0.weight_v", "WN.2.res_skip_layers.1.bias", "WN.2.res_skip_layers.1.weight_g", "WN.2.res_skip_layers.1.weight_v", "WN.2.res_skip_layers.2.bias", "WN.2.res_skip_layers.2.weight_g", "WN.2.res_skip_layers.2.weight_v", "WN.2.res_skip_layers.3.bias", "WN.2.res_skip_layers.3.weight_g", "WN.2.res_skip_layers.3.weight_v", "WN.2.res_skip_layers.4.bias", "WN.2.res_skip_layers.4.weight_g", "WN.2.res_skip_layers.4.weight_v", "WN.2.res_skip_layers.5.bias", "WN.2.res_skip_layers.5.weight_g", "WN.2.res_skip_layers.5.weight_v", "WN.2.res_skip_layers.6.bias", "WN.2.res_skip_layers.6.weight_g", "WN.2.res_skip_layers.6.weight_v", "WN.2.res_skip_layers.7.bias", "WN.2.res_skip_layers.7.weight_g", "WN.2.res_skip_layers.7.weight_v", "WN.2.cond_layers.0.bias", "WN.2.cond_layers.0.weight_g", "WN.2.cond_layers.0.weight_v", "WN.2.cond_layers.1.bias", "WN.2.cond_layers.1.weight_g", "WN.2.cond_layers.1.weight_v", "WN.2.cond_layers.2.bias", "WN.2.cond_layers.2.weight_g", "WN.2.cond_layers.2.weight_v", "WN.2.cond_layers.3.bias", "WN.2.cond_layers.3.weight_g", "WN.2.cond_layers.3.weight_v", "WN.2.cond_layers.4.bias", "WN.2.cond_layers.4.weight_g", "WN.2.cond_layers.4.weight_v", "WN.2.cond_layers.5.bias", "WN.2.cond_layers.5.weight_g", "WN.2.cond_layers.5.weight_v", "WN.2.cond_layers.6.bias", "WN.2.cond_layers.6.weight_g", "WN.2.cond_layers.6.weight_v", "WN.2.cond_layers.7.bias", "WN.2.cond_layers.7.weight_g", "WN.2.cond_layers.7.weight_v", "WN.2.start.bias", "WN.2.start.weight_g", "WN.2.start.weight_v", "WN.2.end.weight", "WN.2.end.bias", "WN.3.in_layers.0.bias", "WN.3.in_layers.0.weight_g", "WN.3.in_layers.0.weight_v", "WN.3.in_layers.1.bias", "WN.3.in_layers.1.weight_g", "WN.3.in_layers.1.weight_v", "WN.3.in_layers.2.bias", "WN.3.in_layers.2.weight_g", "WN.3.in_layers.2.weight_v", "WN.3.in_layers.3.bias", "WN.3.in_layers.3.weight_g", "WN.3.in_layers.3.weight_v", "WN.3.in_layers.4.bias", "WN.3.in_layers.4.weight_g", "WN.3.in_layers.4.weight_v", "WN.3.in_layers.5.bias", "WN.3.in_layers.5.weight_g", "WN.3.in_layers.5.weight_v", "WN.3.in_layers.6.bias", "WN.3.in_layers.6.weight_g", "WN.3.in_layers.6.weight_v", "WN.3.in_layers.7.bias", "WN.3.in_layers.7.weight_g", "WN.3.in_layers.7.weight_v", "WN.3.res_skip_layers.0.bias", "WN.3.res_skip_layers.0.weight_g", "WN.3.res_skip_layers.0.weight_v", "WN.3.res_skip_layers.1.bias", "WN.3.res_skip_layers.1.weight_g", "WN.3.res_skip_layers.1.weight_v", "WN.3.res_skip_layers.2.bias", "WN.3.res_skip_layers.2.weight_g", "WN.3.res_skip_layers.2.weight_v", "WN.3.res_skip_layers.3.bias", "WN.3.res_skip_layers.3.weight_g", "WN.3.res_skip_layers.3.weight_v", "WN.3.res_skip_layers.4.bias", "WN.3.res_skip_layers.4.weight_g", "WN.3.res_skip_layers.4.weight_v", "WN.3.res_skip_layers.5.bias", "WN.3.res_skip_layers.5.weight_g", "WN.3.res_skip_layers.5.weight_v", "WN.3.res_skip_layers.6.bias", "WN.3.res_skip_layers.6.weight_g", "WN.3.res_skip_layers.6.weight_v", "WN.3.res_skip_layers.7.bias", "WN.3.res_skip_layers.7.weight_g", "WN.3.res_skip_layers.7.weight_v", "WN.3.cond_layers.0.bias", "WN.3.cond_layers.0.weight_g", "WN.3.cond_layers.0.weight_v", "WN.3.cond_layers.1.bias", "WN.3.cond_layers.1.weight_g", "WN.3.cond_layers.1.weight_v", "WN.3.cond_layers.2.bias", "WN.3.cond_layers.2.weight_g", "WN.3.cond_layers.2.weight_v", "WN.3.cond_layers.3.bias", "WN.3.cond_layers.3.weight_g", "WN.3.cond_layers.3.weight_v", "WN.3.cond_layers.4.bias", "WN.3.cond_layers.4.weight_g", "WN.3.cond_layers.4.weight_v", "WN.3.cond_layers.5.bias", "WN.3.cond_layers.5.weight_g", "WN.3.cond_layers.5.weight_v", "WN.3.cond_layers.6.bias", "WN.3.cond_layers.6.weight_g", "WN.3.cond_layers.6.weight_v", "WN.3.cond_layers.7.bias", "WN.3.cond_layers.7.weight_g", "WN.3.cond_layers.7.weight_v", "WN.3.start.bias", "WN.3.start.weight_g", "WN.3.start.weight_v", "WN.3.end.weight", "WN.3.end.bias", "WN.4.in_layers.0.bias", "WN.4.in_layers.0.weight_g", "WN.4.in_layers.0.weight_v", "WN.4.in_layers.1.bias", "WN.4.in_layers.1.weight_g", "WN.4.in_layers.1.weight_v", "WN.4.in_layers.2.bias", "WN.4.in_layers.2.weight_g", "WN.4.in_layers.2.weight_v", "WN.4.in_layers.3.bias", "WN.4.in_layers.3.weight_g", "WN.4.in_layers.3.weight_v", "WN.4.in_layers.4.bias", "WN.4.in_layers.4.weight_g", "WN.4.in_layers.4.weight_v", "WN.4.in_layers.5.bias", "WN.4.in_layers.5.weight_g", "WN.4.in_layers.5.weight_v", "WN.4.in_layers.6.bias", "WN.4.in_layers.6.weight_g", "WN.4.in_layers.6.weight_v", "WN.4.in_layers.7.bias", "WN.4.in_layers.7.weight_g", "WN.4.in_layers.7.weight_v", "WN.4.res_skip_layers.0.bias", "WN.4.res_skip_layers.0.weight_g", "WN.4.res_skip_layers.0.weight_v", "WN.4.res_skip_layers.1.bias", "WN.4.res_skip_layers.1.weight_g", "WN.4.res_skip_layers.1.weight_v", "WN.4.res_skip_layers.2.bias", "WN.4.res_skip_layers.2.weight_g", "WN.4.res_skip_layers.2.weight_v", "WN.4.res_skip_layers.3.bias", "WN.4.res_skip_layers.3.weight_g", "WN.4.res_skip_layers.3.weight_v", "WN.4.res_skip_layers.4.bias", "WN.4.res_skip_layers.4.weight_g", "WN.4.res_skip_layers.4.weight_v", "WN.4.res_skip_layers.5.bias", "WN.4.res_skip_layers.5.weight_g", "WN.4.res_skip_layers.5.weight_v", "WN.4.res_skip_layers.6.bias", "WN.4.res_skip_layers.6.weight_g", "WN.4.res_skip_layers.6.weight_v", "WN.4.res_skip_layers.7.bias", "WN.4.res_skip_layers.7.weight_g", "WN.4.res_skip_layers.7.weight_v", "WN.4.cond_layers.0.bias", "WN.4.cond_layers.0.weight_g", "WN.4.cond_layers.0.weight_v", "WN.4.cond_layers.1.bias", "WN.4.cond_layers.1.weight_g", "WN.4.cond_layers.1.weight_v", "WN.4.cond_layers.2.bias", "WN.4.cond_layers.2.weight_g", "WN.4.cond_layers.2.weight_v", "WN.4.cond_layers.3.bias", "WN.4.cond_layers.3.weight_g", "WN.4.cond_layers.3.weight_v", "WN.4.cond_layers.4.bias", "WN.4.cond_layers.4.weight_g", "WN.4.cond_layers.4.weight_v", "WN.4.cond_layers.5.bias", "WN.4.cond_layers.5.weight_g", "WN.4.cond_layers.5.weight_v", "WN.4.cond_layers.6.bias", "WN.4.cond_layers.6.weight_g", "WN.4.cond_layers.6.weight_v", "WN.4.cond_layers.7.bias", "WN.4.cond_layers.7.weight_g", "WN.4.cond_layers.7.weight_v", "WN.4.start.bias", "WN.4.start.weight_g", "WN.4.start.weight_v", "WN.4.end.weight", "WN.4.end.bias", "WN.5.in_layers.0.bias", "WN.5.in_layers.0.weight_g", "WN.5.in_layers.0.weight_v", "WN.5.in_layers.1.bias", "WN.5.in_layers.1.weight_g", "WN.5.in_layers.1.weight_v", "WN.5.in_layers.2.bias", "WN.5.in_layers.2.weight_g", "WN.5.in_layers.2.weight_v", "WN.5.in_layers.3.bias", "WN.5.in_layers.3.weight_g", "WN.5.in_layers.3.weight_v", "WN.5.in_layers.4.bias", "WN.5.in_layers.4.weight_g", "WN.5.in_layers.4.weight_v", "WN.5.in_layers.5.bias", "WN.5.in_layers.5.weight_g", "WN.5.in_layers.5.weight_v", "WN.5.in_layers.6.bias", "WN.5.in_layers.6.weight_g", "WN.5.in_layers.6.weight_v", "WN.5.in_layers.7.bias", "WN.5.in_layers.7.weight_g", "WN.5.in_layers.7.weight_v", "WN.5.res_skip_layers.0.bias", "WN.5.res_skip_layers.0.weight_g", "WN.5.res_skip_layers.0.weight_v", "WN.5.res_skip_layers.1.bias", "WN.5.res_skip_layers.1.weight_g", "WN.5.res_skip_layers.1.weight_v", "WN.5.res_skip_layers.2.bias", "WN.5.res_skip_layers.2.weight_g", "WN.5.res_skip_layers.2.weight_v", "WN.5.res_skip_layers.3.bias", "WN.5.res_skip_layers.3.weight_g", "WN.5.res_skip_layers.3.weight_v", "WN.5.res_skip_layers.4.bias", "WN.5.res_skip_layers.4.weight_g", "WN.5.res_skip_layers.4.weight_v", "WN.5.res_skip_layers.5.bias", "WN.5.res_skip_layers.5.weight_g", "WN.5.res_skip_layers.5.weight_v", "WN.5.res_skip_layers.6.bias", "WN.5.res_skip_layers.6.weight_g", "WN.5.res_skip_layers.6.weight_v", "WN.5.res_skip_layers.7.bias", "WN.5.res_skip_layers.7.weight_g", "WN.5.res_skip_layers.7.weight_v", "WN.5.cond_layers.0.bias", "WN.5.cond_layers.0.weight_g", "WN.5.cond_layers.0.weight_v", "WN.5.cond_layers.1.bias", "WN.5.cond_layers.1.weight_g", "WN.5.cond_layers.1.weight_v", "WN.5.cond_layers.2.bias", "WN.5.cond_layers.2.weight_g", "WN.5.cond_layers.2.weight_v", "WN.5.cond_layers.3.bias", "WN.5.cond_layers.3.weight_g", "WN.5.cond_layers.3.weight_v", "WN.5.cond_layers.4.bias", "WN.5.cond_layers.4.weight_g", "WN.5.cond_layers.4.weight_v", "WN.5.cond_layers.5.bias", "WN.5.cond_layers.5.weight_g", "WN.5.cond_layers.5.weight_v", "WN.5.cond_layers.6.bias", "WN.5.cond_layers.6.weight_g", "WN.5.cond_layers.6.weight_v", "WN.5.cond_layers.7.bias", "WN.5.cond_layers.7.weight_g", "WN.5.cond_layers.7.weight_v", "WN.5.start.bias", "WN.5.start.weight_g", "WN.5.start.weight_v", "WN.5.end.weight", "WN.5.end.bias", "WN.6.in_layers.0.bias", "WN.6.in_layers.0.weight_g", "WN.6.in_layers.0.weight_v", "WN.6.in_layers.1.bias", "WN.6.in_layers.1.weight_g", "WN.6.in_layers.1.weight_v", "WN.6.in_layers.2.bias", "WN.6.in_layers.2.weight_g", "WN.6.in_layers.2.weight_v", "WN.6.in_layers.3.bias", "WN.6.in_layers.3.weight_g", "WN.6.in_layers.3.weight_v", "WN.6.in_layers.4.bias", "WN.6.in_layers.4.weight_g", "WN.6.in_layers.4.weight_v", "WN.6.in_layers.5.bias", "WN.6.in_layers.5.weight_g", "WN.6.in_layers.5.weight_v", "WN.6.in_layers.6.bias", "WN.6.in_layers.6.weight_g", "WN.6.in_layers.6.weight_v", "WN.6.in_layers.7.bias", "WN.6.in_layers.7.weight_g", "WN.6.in_layers.7.weight_v", "WN.6.res_skip_layers.0.bias", "WN.6.res_skip_layers.0.weight_g", "WN.6.res_skip_layers.0.weight_v", "WN.6.res_skip_layers.1.bias", "WN.6.res_skip_layers.1.weight_g", "WN.6.res_skip_layers.1.weight_v", "WN.6.res_skip_layers.2.bias", "WN.6.res_skip_layers.2.weight_g", "WN.6.res_skip_layers.2.weight_v", "WN.6.res_skip_layers.3.bias", "WN.6.res_skip_layers.3.weight_g", "WN.6.res_skip_layers.3.weight_v", "WN.6.res_skip_layers.4.bias", "WN.6.res_skip_layers.4.weight_g", "WN.6.res_skip_layers.4.weight_v", "WN.6.res_skip_layers.5.bias", "WN.6.res_skip_layers.5.weight_g", "WN.6.res_skip_layers.5.weight_v", "WN.6.res_skip_layers.6.bias", "WN.6.res_skip_layers.6.weight_g", "WN.6.res_skip_layers.6.weight_v", "WN.6.res_skip_layers.7.bias", "WN.6.res_skip_layers.7.weight_g", "WN.6.res_skip_layers.7.weight_v", "WN.6.cond_layers.0.bias", "WN.6.cond_layers.0.weight_g", "WN.6.cond_layers.0.weight_v", "WN.6.cond_layers.1.bias", "WN.6.cond_layers.1.weight_g", "WN.6.cond_layers.1.weight_v", "WN.6.cond_layers.2.bias", "WN.6.cond_layers.2.weight_g", "WN.6.cond_layers.2.weight_v", "WN.6.cond_layers.3.bias", "WN.6.cond_layers.3.weight_g", "WN.6.cond_layers.3.weight_v", "WN.6.cond_layers.4.bias", "WN.6.cond_layers.4.weight_g", "WN.6.cond_layers.4.weight_v", "WN.6.cond_layers.5.bias", "WN.6.cond_layers.5.weight_g", "WN.6.cond_layers.5.weight_v", "WN.6.cond_layers.6.bias", "WN.6.cond_layers.6.weight_g", "WN.6.cond_layers.6.weight_v", "WN.6.cond_layers.7.bias", "WN.6.cond_layers.7.weight_g", "WN.6.cond_layers.7.weight_v", "WN.6.start.bias", "WN.6.start.weight_g", "WN.6.start.weight_v", "WN.6.end.weight", "WN.6.end.bias", "WN.7.in_layers.0.bias", "WN.7.in_layers.0.weight_g", "WN.7.in_layers.0.weight_v", "WN.7.in_layers.1.bias", "WN.7.in_layers.1.weight_g", "WN.7.in_layers.1.weight_v", "WN.7.in_layers.2.bias", "WN.7.in_layers.2.weight_g", "WN.7.in_layers.2.weight_v", "WN.7.in_layers.3.bias", "WN.7.in_layers.3.weight_g", "WN.7.in_layers.3.weight_v", "WN.7.in_layers.4.bias", "WN.7.in_layers.4.weight_g", "WN.7.in_layers.4.weight_v", "WN.7.in_layers.5.bias", "WN.7.in_layers.5.weight_g", "WN.7.in_layers.5.weight_v", "WN.7.in_layers.6.bias", "WN.7.in_layers.6.weight_g", "WN.7.in_layers.6.weight_v", "WN.7.in_layers.7.bias", "WN.7.in_layers.7.weight_g", "WN.7.in_layers.7.weight_v", "WN.7.res_skip_layers.0.bias", "WN.7.res_skip_layers.0.weight_g", "WN.7.res_skip_layers.0.weight_v", "WN.7.res_skip_layers.1.bias", "WN.7.res_skip_layers.1.weight_g", "WN.7.res_skip_layers.1.weight_v", "WN.7.res_skip_layers.2.bias", "WN.7.res_skip_layers.2.weight_g", "WN.7.res_skip_layers.2.weight_v", "WN.7.res_skip_layers.3.bias", "WN.7.res_skip_layers.3.weight_g", "WN.7.res_skip_layers.3.weight_v", "WN.7.res_skip_layers.4.bias", "WN.7.res_skip_layers.4.weight_g", "WN.7.res_skip_layers.4.weight_v", "WN.7.res_skip_layers.5.bias", "WN.7.res_skip_layers.5.weight_g", "WN.7.res_skip_layers.5.weight_v", "WN.7.res_skip_layers.6.bias", "WN.7.res_skip_layers.6.weight_g", "WN.7.res_skip_layers.6.weight_v", "WN.7.res_skip_layers.7.bias", "WN.7.res_skip_layers.7.weight_g", "WN.7.res_skip_layers.7.weight_v", "WN.7.cond_layers.0.bias", "WN.7.cond_layers.0.weight_g", "WN.7.cond_layers.0.weight_v", "WN.7.cond_layers.1.bias", "WN.7.cond_layers.1.weight_g", "WN.7.cond_layers.1.weight_v", "WN.7.cond_layers.2.bias", "WN.7.cond_layers.2.weight_g", "WN.7.cond_layers.2.weight_v", "WN.7.cond_layers.3.bias", "WN.7.cond_layers.3.weight_g", "WN.7.cond_layers.3.weight_v", "WN.7.cond_layers.4.bias", "WN.7.cond_layers.4.weight_g", "WN.7.cond_layers.4.weight_v", "WN.7.cond_layers.5.bias", "WN.7.cond_layers.5.weight_g", "WN.7.cond_layers.5.weight_v", "WN.7.cond_layers.6.bias", "WN.7.cond_layers.6.weight_g", "WN.7.cond_layers.6.weight_v", "WN.7.cond_layers.7.bias", "WN.7.cond_layers.7.weight_g", "WN.7.cond_layers.7.weight_v", "WN.7.start.bias", "WN.7.start.weight_g", "WN.7.start.weight_v", "WN.7.end.weight", "WN.7.end.bias", "WN.8.in_layers.0.bias", "WN.8.in_layers.0.weight_g", "WN.8.in_layers.0.weight_v", "WN.8.in_layers.1.bias", "WN.8.in_layers.1.weight_g", "WN.8.in_layers.1.weight_v", "WN.8.in_layers.2.bias", "WN.8.in_layers.2.weight_g", "WN.8.in_layers.2.weight_v", "WN.8.in_layers.3.bias", "WN.8.in_layers.3.weight_g", "WN.8.in_layers.3.weight_v", "WN.8.in_layers.4.bias", "WN.8.in_layers.4.weight_g", "WN.8.in_layers.4.weight_v", "WN.8.in_layers.5.bias", "WN.8.in_layers.5.weight_g", "WN.8.in_layers.5.weight_v", "WN.8.in_layers.6.bias", "WN.8.in_layers.6.weight_g", "WN.8.in_layers.6.weight_v", "WN.8.in_layers.7.bias", "WN.8.in_layers.7.weight_g", "WN.8.in_layers.7.weight_v", "WN.8.res_skip_layers.0.bias", "WN.8.res_skip_layers.0.weight_g", "WN.8.res_skip_layers.0.weight_v", "WN.8.res_skip_layers.1.bias", "WN.8.res_skip_layers.1.weight_g", "WN.8.res_skip_layers.1.weight_v", "WN.8.res_skip_layers.2.bias", "WN.8.res_skip_layers.2.weight_g", "WN.8.res_skip_layers.2.weight_v", "WN.8.res_skip_layers.3.bias", "WN.8.res_skip_layers.3.weight_g", "WN.8.res_skip_layers.3.weight_v", "WN.8.res_skip_layers.4.bias", "WN.8.res_skip_layers.4.weight_g", "WN.8.res_skip_layers.4.weight_v", "WN.8.res_skip_layers.5.bias", "WN.8.res_skip_layers.5.weight_g", "WN.8.res_skip_layers.5.weight_v", "WN.8.res_skip_layers.6.bias", "WN.8.res_skip_layers.6.weight_g", "WN.8.res_skip_layers.6.weight_v", "WN.8.res_skip_layers.7.bias", "WN.8.res_skip_layers.7.weight_g", "WN.8.res_skip_layers.7.weight_v", "WN.8.cond_layers.0.bias", "WN.8.cond_layers.0.weight_g", "WN.8.cond_layers.0.weight_v", "WN.8.cond_layers.1.bias", "WN.8.cond_layers.1.weight_g", "WN.8.cond_layers.1.weight_v", "WN.8.cond_layers.2.bias", "WN.8.cond_layers.2.weight_g", "WN.8.cond_layers.2.weight_v", "WN.8.cond_layers.3.bias", "WN.8.cond_layers.3.weight_g", "WN.8.cond_layers.3.weight_v", "WN.8.cond_layers.4.bias", "WN.8.cond_layers.4.weight_g", "WN.8.cond_layers.4.weight_v", "WN.8.cond_layers.5.bias", "WN.8.cond_layers.5.weight_g", "WN.8.cond_layers.5.weight_v", "WN.8.cond_layers.6.bias", "WN.8.cond_layers.6.weight_g", "WN.8.cond_layers.6.weight_v", "WN.8.cond_layers.7.bias", "WN.8.cond_layers.7.weight_g", "WN.8.cond_layers.7.weight_v", "WN.8.start.bias", "WN.8.start.weight_g", "WN.8.start.weight_v", "WN.8.end.weight", "WN.8.end.bias", "WN.9.in_layers.0.bias", "WN.9.in_layers.0.weight_g", "WN.9.in_layers.0.weight_v", "WN.9.in_layers.1.bias", "WN.9.in_layers.1.weight_g", "WN.9.in_layers.1.weight_v", "WN.9.in_layers.2.bias", "WN.9.in_layers.2.weight_g", "WN.9.in_layers.2.weight_v", "WN.9.in_layers.3.bias", "WN.9.in_layers.3.weight_g", "WN.9.in_layers.3.weight_v", "WN.9.in_layers.4.bias", "WN.9.in_layers.4.weight_g", "WN.9.in_layers.4.weight_v", "WN.9.in_layers.5.bias", "WN.9.in_layers.5.weight_g", "WN.9.in_layers.5.weight_v", "WN.9.in_layers.6.bias", "WN.9.in_layers.6.weight_g", "WN.9.in_layers.6.weight_v", "WN.9.in_layers.7.bias", "WN.9.in_layers.7.weight_g", "WN.9.in_layers.7.weight_v", "WN.9.res_skip_layers.0.bias", "WN.9.res_skip_layers.0.weight_g", "WN.9.res_skip_layers.0.weight_v", "WN.9.res_skip_layers.1.bias", "WN.9.res_skip_layers.1.weight_g", "WN.9.res_skip_layers.1.weight_v", "WN.9.res_skip_layers.2.bias", "WN.9.res_skip_layers.2.weight_g", "WN.9.res_skip_layers.2.weight_v", "WN.9.res_skip_layers.3.bias", "WN.9.res_skip_layers.3.weight_g", "WN.9.res_skip_layers.3.weight_v", "WN.9.res_skip_layers.4.bias", "WN.9.res_skip_layers.4.weight_g", "WN.9.res_skip_layers.4.weight_v", "WN.9.res_skip_layers.5.bias", "WN.9.res_skip_layers.5.weight_g", "WN.9.res_skip_layers.5.weight_v", "WN.9.res_skip_layers.6.bias", "WN.9.res_skip_layers.6.weight_g", "WN.9.res_skip_layers.6.weight_v", "WN.9.res_skip_layers.7.bias", "WN.9.res_skip_layers.7.weight_g", "WN.9.res_skip_layers.7.weight_v", "WN.9.cond_layers.0.bias", "WN.9.cond_layers.0.weight_g", "WN.9.cond_layers.0.weight_v", "WN.9.cond_layers.1.bias", "WN.9.cond_layers.1.weight_g", "WN.9.cond_layers.1.weight_v", "WN.9.cond_layers.2.bias", "WN.9.cond_layers.2.weight_g", "WN.9.cond_layers.2.weight_v", "WN.9.cond_layers.3.bias", "WN.9.cond_layers.3.weight_g", "WN.9.cond_layers.3.weight_v", "WN.9.cond_layers.4.bias", "WN.9.cond_layers.4.weight_g", "WN.9.cond_layers.4.weight_v", "WN.9.cond_layers.5.bias", "WN.9.cond_layers.5.weight_g", "WN.9.cond_layers.5.weight_v", "WN.9.cond_layers.6.bias", "WN.9.cond_layers.6.weight_g", "WN.9.cond_layers.6.weight_v", "WN.9.cond_layers.7.bias", "WN.9.cond_layers.7.weight_g", "WN.9.cond_layers.7.weight_v", "WN.9.start.bias", "WN.9.start.weight_g", "WN.9.start.weight_v", "WN.9.end.weight", "WN.9.end.bias", "WN.10.in_layers.0.bias", "WN.10.in_layers.0.weight_g", "WN.10.in_layers.0.weight_v", "WN.10.in_layers.1.bias", "WN.10.in_layers.1.weight_g", "WN.10.in_layers.1.weight_v", "WN.10.in_layers.2.bias", "WN.10.in_layers.2.weight_g", "WN.10.in_layers.2.weight_v", "WN.10.in_layers.3.bias", "WN.10.in_layers.3.weight_g", "WN.10.in_layers.3.weight_v", "WN.10.in_layers.4.bias", "WN.10.in_layers.4.weight_g", "WN.10.in_layers.4.weight_v", "WN.10.in_layers.5.bias", "WN.10.in_layers.5.weight_g", "WN.10.in_layers.5.weight_v", "WN.10.in_layers.6.bias", "WN.10.in_layers.6.weight_g", "WN.10.in_layers.6.weight_v", "WN.10.in_layers.7.bias", "WN.10.in_layers.7.weight_g", "WN.10.in_layers.7.weight_v", "WN.10.res_skip_layers.0.bias", "WN.10.res_skip_layers.0.weight_g", "WN.10.res_skip_layers.0.weight_v", "WN.10.res_skip_layers.1.bias", "WN.10.res_skip_layers.1.weight_g", "WN.10.res_skip_layers.1.weight_v", "WN.10.res_skip_layers.2.bias", "WN.10.res_skip_layers.2.weight_g", "WN.10.res_skip_layers.2.weight_v", "WN.10.res_skip_layers.3.bias", "WN.10.res_skip_layers.3.weight_g", "WN.10.res_skip_layers.3.weight_v", "WN.10.res_skip_layers.4.bias", "WN.10.res_skip_layers.4.weight_g", "WN.10.res_skip_layers.4.weight_v", "WN.10.res_skip_layers.5.bias", "WN.10.res_skip_layers.5.weight_g", "WN.10.res_skip_layers.5.weight_v", "WN.10.res_skip_layers.6.bias", "WN.10.res_skip_layers.6.weight_g", "WN.10.res_skip_layers.6.weight_v", "WN.10.res_skip_layers.7.bias", "WN.10.res_skip_layers.7.weight_g", "WN.10.res_skip_layers.7.weight_v", "WN.10.cond_layers.0.bias", "WN.10.cond_layers.0.weight_g", "WN.10.cond_layers.0.weight_v", "WN.10.cond_layers.1.bias", "WN.10.cond_layers.1.weight_g", "WN.10.cond_layers.1.weight_v", "WN.10.cond_layers.2.bias", "WN.10.cond_layers.2.weight_g", "WN.10.cond_layers.2.weight_v", "WN.10.cond_layers.3.bias", "WN.10.cond_layers.3.weight_g", "WN.10.cond_layers.3.weight_v", "WN.10.cond_layers.4.bias", "WN.10.cond_layers.4.weight_g", "WN.10.cond_layers.4.weight_v", "WN.10.cond_layers.5.bias", "WN.10.cond_layers.5.weight_g", "WN.10.cond_layers.5.weight_v", "WN.10.cond_layers.6.bias", "WN.10.cond_layers.6.weight_g", "WN.10.cond_layers.6.weight_v", "WN.10.cond_layers.7.bias", "WN.10.cond_layers.7.weight_g", "WN.10.cond_layers.7.weight_v", "WN.10.start.bias", "WN.10.start.weight_g", "WN.10.start.weight_v", "WN.10.end.weight", "WN.10.end.bias", "WN.11.in_layers.0.bias", "WN.11.in_layers.0.weight_g", "WN.11.in_layers.0.weight_v", "WN.11.in_layers.1.bias", "WN.11.in_layers.1.weight_g", "WN.11.in_layers.1.weight_v", "WN.11.in_layers.2.bias", "WN.11.in_layers.2.weight_g", "WN.11.in_layers.2.weight_v", "WN.11.in_layers.3.bias", "WN.11.in_layers.3.weight_g", "WN.11.in_layers.3.weight_v", "WN.11.in_layers.4.bias", "WN.11.in_layers.4.weight_g", "WN.11.in_layers.4.weight_v", "WN.11.in_layers.5.bias", "WN.11.in_layers.5.weight_g", "WN.11.in_layers.5.weight_v", "WN.11.in_layers.6.bias", "WN.11.in_layers.6.weight_g", "WN.11.in_layers.6.weight_v", "WN.11.in_layers.7.bias", "WN.11.in_layers.7.weight_g", "WN.11.in_layers.7.weight_v", "WN.11.res_skip_layers.0.bias", "WN.11.res_skip_layers.0.weight_g", "WN.11.res_skip_layers.0.weight_v", "WN.11.res_skip_layers.1.bias", "WN.11.res_skip_layers.1.weight_g", "WN.11.res_skip_layers.1.weight_v", "WN.11.res_skip_layers.2.bias", "WN.11.res_skip_layers.2.weight_g", "WN.11.res_skip_layers.2.weight_v", "WN.11.res_skip_layers.3.bias", "WN.11.res_skip_layers.3.weight_g", "WN.11.res_skip_layers.3.weight_v", "WN.11.res_skip_layers.4.bias", "WN.11.res_skip_layers.4.weight_g", "WN.11.res_skip_layers.4.weight_v", "WN.11.res_skip_layers.5.bias", "WN.11.res_skip_layers.5.weight_g", "WN.11.res_skip_layers.5.weight_v", "WN.11.res_skip_layers.6.bias", "WN.11.res_skip_layers.6.weight_g", "WN.11.res_skip_layers.6.weight_v", "WN.11.res_skip_layers.7.bias", "WN.11.res_skip_layers.7.weight_g", "WN.11.res_skip_layers.7.weight_v", "WN.11.cond_layers.0.bias", "WN.11.cond_layers.0.weight_g", "WN.11.cond_layers.0.weight_v", "WN.11.cond_layers.1.bias", "WN.11.cond_layers.1.weight_g", "WN.11.cond_layers.1.weight_v", "WN.11.cond_layers.2.bias", "WN.11.cond_layers.2.weight_g", "WN.11.cond_layers.2.weight_v", "WN.11.cond_layers.3.bias", "WN.11.cond_layers.3.weight_g", "WN.11.cond_layers.3.weight_v", "WN.11.cond_layers.4.bias", "WN.11.cond_layers.4.weight_g", "WN.11.cond_layers.4.weight_v", "WN.11.cond_layers.5.bias", "WN.11.cond_layers.5.weight_g", "WN.11.cond_layers.5.weight_v", "WN.11.cond_layers.6.bias", "WN.11.cond_layers.6.weight_g", "WN.11.cond_layers.6.weight_v", "WN.11.cond_layers.7.bias", "WN.11.cond_layers.7.weight_g", "WN.11.cond_layers.7.weight_v", "WN.11.start.bias", "WN.11.start.weight_g", "WN.11.start.weight_v", "WN.11.end.weight", "WN.11.end.bias", "convinv.0.conv.weight", "convinv.1.conv.weight", "convinv.2.conv.weight", "convinv.3.conv.weight", "convinv.4.conv.weight", "convinv.5.conv.weight", "convinv.6.conv.weight", "convinv.7.conv.weight", "convinv.8.conv.weight", "convinv.9.conv.weight", "convinv.10.conv.weight", "convinv.11.conv.weight". 
    Unexpected key(s) in state_dict: "module.upsample.weight", "module.upsample.bias", "module.WN.0.in_layers.0.bias", "module.WN.0.in_layers.0.weight_g", "module.WN.0.in_layers.0.weight_v", "module.WN.0.in_layers.1.bias", "module.WN.0.in_layers.1.weight_g", "module.WN.0.in_layers.1.weight_v", "module.WN.0.in_layers.2.bias", "module.WN.0.in_layers.2.weight_g", "module.WN.0.in_layers.2.weight_v", "module.WN.0.in_layers.3.bias", "module.WN.0.in_layers.3.weight_g", "module.WN.0.in_layers.3.weight_v", "module.WN.0.in_layers.4.bias", "module.WN.0.in_layers.4.weight_g", "module.WN.0.in_layers.4.weight_v", "module.WN.0.in_layers.5.bias", "module.WN.0.in_layers.5.weight_g", "module.WN.0.in_layers.5.weight_v", "module.WN.0.in_layers.6.bias", "module.WN.0.in_layers.6.weight_g", "module.WN.0.in_layers.6.weight_v", "module.WN.0.in_layers.7.bias", "module.WN.0.in_layers.7.weight_g", "module.WN.0.in_layers.7.weight_v", "module.WN.0.res_skip_layers.0.bias", "module.WN.0.res_skip_layers.0.weight_g", "module.WN.0.res_skip_layers.0.weight_v", "module.WN.0.res_skip_layers.1.bias", "module.WN.0.res_skip_layers.1.weight_g", "module.WN.0.res_skip_layers.1.weight_v", "module.WN.0.res_skip_layers.2.bias", "module.WN.0.res_skip_layers.2.weight_g", "module.WN.0.res_skip_layers.2.weight_v", "module.WN.0.res_skip_layers.3.bias", "module.WN.0.res_skip_layers.3.weight_g", "module.WN.0.res_skip_layers.3.weight_v", "module.WN.0.res_skip_layers.4.bias", "module.WN.0.res_skip_layers.4.weight_g", "module.WN.0.res_skip_layers.4.weight_v", "module.WN.0.res_skip_layers.5.bias", "module.WN.0.res_skip_layers.5.weight_g", "module.WN.0.res_skip_layers.5.weight_v", "module.WN.0.res_skip_layers.6.bias", "module.WN.0.res_skip_layers.6.weight_g", "module.WN.0.res_skip_layers.6.weight_v", "module.WN.0.res_skip_layers.7.bias", "module.WN.0.res_skip_layers.7.weight_g", "module.WN.0.res_skip_layers.7.weight_v", "module.WN.0.cond_layers.0.bias", "module.WN.0.cond_layers.0.weight_g", "module.WN.0.cond_layers.0.weight_v", "module.WN.0.cond_layers.1.bias", "module.WN.0.cond_layers.1.weight_g", "module.WN.0.cond_layers.1.weight_v", "module.WN.0.cond_layers.2.bias", "module.WN.0.cond_layers.2.weight_g", "module.WN.0.cond_layers.2.weight_v", "module.WN.0.cond_layers.3.bias", "module.WN.0.cond_layers.3.weight_g", "module.WN.0.cond_layers.3.weight_v", "module.WN.0.cond_layers.4.bias", "module.WN.0.cond_layers.4.weight_g", "module.WN.0.cond_layers.4.weight_v", "module.WN.0.cond_layers.5.bias", "module.WN.0.cond_layers.5.weight_g", "module.WN.0.cond_layers.5.weight_v", "module.WN.0.cond_layers.6.bias", "module.WN.0.cond_layers.6.weight_g", "module.WN.0.cond_layers.6.weight_v", "module.WN.0.cond_layers.7.bias", "module.WN.0.cond_layers.7.weight_g", "module.WN.0.cond_layers.7.weight_v", "module.WN.0.start.bias", "module.WN.0.start.weight_g", "module.WN.0.start.weight_v", "module.WN.0.end.weight", "module.WN.0.end.bias", "module.WN.1.in_layers.0.bias", "module.WN.1.in_layers.0.weight_g", "module.WN.1.in_layers.0.weight_v", "module.WN.1.in_layers.1.bias", "module.WN.1.in_layers.1.weight_g", "module.WN.1.in_layers.1.weight_v", "module.WN.1.in_layers.2.bias", "module.WN.1.in_layers.2.weight_g", "module.WN.1.in_layers.2.weight_v", "module.WN.1.in_layers.3.bias", "module.WN.1.in_layers.3.weight_g", "module.WN.1.in_layers.3.weight_v", "module.WN.1.in_layers.4.bias", "module.WN.1.in_layers.4.weight_g", "module.WN.1.in_layers.4.weight_v", "module.WN.1.in_layers.5.bias", "module.WN.1.in_layers.5.weight_g", "module.WN.1.in_layers.5.weight_v", "module.WN.1.in_layers.6.bias", "module.WN.1.in_layers.6.weight_g", "module.WN.1.in_layers.6.weight_v", "module.WN.1.in_layers.7.bias", "module.WN.1.in_layers.7.weight_g", "module.WN.1.in_layers.7.weight_v", "module.WN.1.res_skip_layers.0.bias", "module.WN.1.res_skip_layers.0.weight_g", "module.WN.1.res_skip_layers.0.weight_v", "module.WN.1.res_skip_layers.1.bias", "module.WN.1.res_skip_layers.1.weight_g", "module.WN.1.res_skip_layers.1.weight_v", "module.WN.1.res_skip_layers.2.bias", "module.WN.1.res_skip_layers.2.weight_g", "module.WN.1.res_skip_layers.2.weight_v", "module.WN.1.res_skip_layers.3.bias", "module.WN.1.res_skip_layers.3.weight_g", "module.WN.1.res_skip_layers.3.weight_v", "module.WN.1.res_skip_layers.4.bias", "module.WN.1.res_skip_layers.4.weight_g", "module.WN.1.res_skip_layers.4.weight_v", "module.WN.1.res_skip_layers.5.bias", "module.WN.1.res_skip_layers.5.weight_g", "module.WN.1.res_skip_layers.5.weight_v", "module.WN.1.res_skip_layers.6.bias", "module.WN.1.res_skip_layers.6.weight_g", "module.WN.1.res_skip_layers.6.weight_v", "module.WN.1.res_skip_layers.7.bias", "module.WN.1.res_skip_layers.7.weight_g", "module.WN.1.res_skip_layers.7.weight_v", "module.WN.1.cond_layers.0.bias", "module.WN.1.cond_layers.0.weight_g", "module.WN.1.cond_layers.0.weight_v", "module.WN.1.cond_layers.1.bias", "module.WN.1.cond_layers.1.weight_g", "module.WN.1.cond_layers.1.weight_v", "module.WN.1.cond_layers.2.bias", "module.WN.1.cond_layers.2.weight_g", "module.WN.1.cond_layers.2.weight_v", "module.WN.1.cond_layers.3.bias", "module.WN.1.cond_layers.3.weight_g", "module.WN.1.cond_layers.3.weight_v", "module.WN.1.cond_layers.4.bias", "module.WN.1.cond_layers.4.weight_g", "module.WN.1.cond_layers.4.weight_v", "module.WN.1.cond_layers.5.bias", "module.WN.1.cond_layers.5.weight_g", "module.WN.1.cond_layers.5.weight_v", "module.WN.1.cond_layers.6.bias", "module.WN.1.cond_layers.6.weight_g", "module.WN.1.cond_layers.6.weight_v", "module.WN.1.cond_layers.7.bias", "module.WN.1.cond_layers.7.weight_g", "module.WN.1.cond_layers.7.weight_v", "module.WN.1.start.bias", "module.WN.1.start.weight_g", "module.WN.1.start.weight_v", "module.WN.1.end.weight", "module.WN.1.end.bias", "module.WN.2.in_layers.0.bias", "module.WN.2.in_layers.0.weight_g", "module.WN.2.in_layers.0.weight_v", "module.WN.2.in_layers.1.bias", "module.WN.2.in_layers.1.weight_g", "module.WN.2.in_layers.1.weight_v", "module.WN.2.in_layers.2.bias", "module.WN.2.in_layers.2.weight_g", "module.WN.2.in_layers.2.weight_v", "module.WN.2.in_layers.3.bias", "module.WN.2.in_layers.3.weight_g", "module.WN.2.in_layers.3.weight_v", "module.WN.2.in_layers.4.bias", "module.WN.2.in_layers.4.weight_g", "module.WN.2.in_layers.4.weight_v", "module.WN.2.in_layers.5.bias", "module.WN.2.in_layers.5.weight_g", "module.WN.2.in_layers.5.weight_v", "module.WN.2.in_layers.6.bias", "module.WN.2.in_layers.6.weight_g", "module.WN.2.in_layers.6.weight_v", "module.WN.2.in_layers.7.bias", "module.WN.2.in_layers.7.weight_g", "module.WN.2.in_layers.7.weight_v", "module.WN.2.res_skip_layers.0.bias", "module.WN.2.res_skip_layers.0.weight_g", "module.WN.2.res_skip_layers.0.weight_v", "module.WN.2.res_skip_layers.1.bias", "module.WN.2.res_skip_layers.1.weight_g", "module.WN.2.res_skip_layers.1.weight_v", "module.WN.2.res_skip_layers.2.bias", "module.WN.2.res_skip_layers.2.weight_g", "module.WN.2.res_skip_layers.2.weight_v", "module.WN.2.res_skip_layers.3.bias", "module.WN.2.res_skip_layers.3.weight_g", "module.WN.2.res_skip_layers.3.weight_v", "module.WN.2.res_skip_layers.4.bias", "module.WN.2.res_skip_layers.4.weight_g", "module.WN.2.res_skip_layers.4.weight_v", "module.WN.2.res_skip_layers.5.bias", "module.WN.2.res_skip_layers.5.weight_g", "module.WN.2.res_skip_layers.5.weight_v", "module.WN.2.res_skip_layers.6.bias", "module.WN.2.res_skip_layers.6.weight_g", "module.WN.2.res_skip_layers.6.weight_v", "module.WN.2.res_skip_layers.7.bias", "module.WN.2.res_skip_layers.7.weight_g", "module.WN.2.res_skip_layers.7.weight_v", "module.WN.2.cond_layers.0.bias", "module.WN.2.cond_layers.0.weight_g", "module.WN.2.cond_layers.0.weight_v", "module.WN.2.cond_layers.1.bias", "module.WN.2.cond_layers.1.weight_g", "module.WN.2.cond_layers.1.weight_v", "module.WN.2.cond_layers.2.bias", "module.WN.2.cond_layers.2.weight_g", "module.WN.2.cond_layers.2.weight_v", "module.WN.2.cond_layers.3.bias", "module.WN.2.cond_layers.3.weight_g", "module.WN.2.cond_layers.3.weight_v", "module.WN.2.cond_layers.4.bias", "module.WN.2.cond_layers.4.weight_g", "module.WN.2.cond_layers.4.weight_v", "module.WN.2.cond_layers.5.bias", "module.WN.2.cond_layers.5.weight_g", "module.WN.2.cond_layers.5.weight_v", "module.WN.2.cond_layers.6.bias", "module.WN.2.cond_layers.6.weight_g", "module.WN.2.cond_layers.6.weight_v", "module.WN.2.cond_layers.7.bias", "module.WN.2.cond_layers.7.weight_g", "module.WN.2.cond_layers.7.weight_v", "module.WN.2.start.bias", "module.WN.2.start.weight_g", "module.WN.2.start.weight_v", "module.WN.2.end.weight", "module.WN.2.end.bias", "module.WN.3.in_layers.0.bias", "module.WN.3.in_layers.0.weight_g", "module.WN.3.in_layers.0.weight_v", "module.WN.3.in_layers.1.bias", "module.WN.3.in_layers.1.weight_g", "module.WN.3.in_layers.1.weight_v", "module.WN.3.in_layers.2.bias", "module.WN.3.in_layers.2.weight_g", "module.WN.3.in_layers.2.weight_v", "module.WN.3.in_layers.3.bias", "module.WN.3.in_layers.3.weight_g", "module.WN.3.in_layers.3.weight_v", "module.WN.3.in_layers.4.bias", "module.WN.3.in_layers.4.weight_g", "module.WN.3.in_layers.4.weight_v", "module.WN.3.in_layers.5.bias", "module.WN.3.in_layers.5.weight_g", "module.WN.3.in_layers.5.weight_v", "module.WN.3.in_layers.6.bias", "module.WN.3.in_layers.6.weight_g", "module.WN.3.in_layers.6.weight_v", "module.WN.3.in_layers.7.bias", "module.WN.3.in_layers.7.weight_g", "module.WN.3.in_layers.7.weight_v", "module.WN.3.res_skip_layers.0.bias", "module.WN.3.res_skip_layers.0.weight_g", "module.WN.3.res_skip_layers.0.weight_v", "module.WN.3.res_skip_layers.1.bias", "module.WN.3.res_skip_layers.1.weight_g", "module.WN.3.res_skip_layers.1.weight_v", "module.WN.3.res_skip_layers.2.bias", "module.WN.3.res_skip_layers.2.weight_g", "module.WN.3.res_skip_layers.2.weight_v", "module.WN.3.res_skip_layers.3.bias", "module.WN.3.res_skip_layers.3.weight_g", "module.WN.3.res_skip_layers.3.weight_v", "module.WN.3.res_skip_layers.4.bias", "module.WN.3.res_skip_layers.4.weight_g", "module.WN.3.res_skip_layers.4.weight_v", "module.WN.3.res_skip_layers.5.bias", "module.WN.3.res_skip_layers.5.weight_g", "module.WN.3.res_skip_layers.5.weight_v", "module.WN.3.res_skip_layers.6.bias", "module.WN.3.res_skip_layers.6.weight_g", "module.WN.3.res_skip_layers.6.weight_v", "module.WN.3.res_skip_layers.7.bias", "module.WN.3.res_skip_layers.7.weight_g", "module.WN.3.res_skip_layers.7.weight_v", "module.WN.3.cond_layers.0.bias", "module.WN.3.cond_layers.0.weight_g", "module.WN.3.cond_layers.0.weight_v", "module.WN.3.cond_layers.1.bias", "module.WN.3.cond_layers.1.weight_g", "module.WN.3.cond_layers.1.weight_v", "module.WN.3.cond_layers.2.bias", "module.WN.3.cond_layers.2.weight_g", "module.WN.3.cond_layers.2.weight_v", "module.WN.3.cond_layers.3.bias", "module.WN.3.cond_layers.3.weight_g", "module.WN.3.cond_layers.3.weight_v", "module.WN.3.cond_layers.4.bias", "module.WN.3.cond_layers.4.weight_g", "module.WN.3.cond_layers.4.weight_v", "module.WN.3.cond_layers.5.bias", "module.WN.3.cond_layers.5.weight_g", "module.WN.3.cond_layers.5.weight_v", "module.WN.3.cond_layers.6.bias", "module.WN.3.cond_layers.6.weight_g", "module.WN.3.cond_layers.6.weight_v", "module.WN.3.cond_layers.7.bias", "module.WN.3.cond_layers.7.weight_g", "module.WN.3.cond_layers.7.weight_v", "module.WN.3.start.bias", "module.WN.3.start.weight_g", "module.WN.3.start.weight_v", "module.WN.3.end.weight", "module.WN.3.end.bias", "module.WN.4.in_layers.0.bias", "module.WN.4.in_layers.0.weight_g", "module.WN.4.in_layers.0.weight_v", "module.WN.4.in_layers.1.bias", "module.WN.4.in_layers.1.weight_g", "module.WN.4.in_layers.1.weight_v", "module.WN.4.in_layers.2.bias", "module.WN.4.in_layers.2.weight_g", "module.WN.4.in_layers.2.weight_v", "module.WN.4.in_layers.3.bias", "module.WN.4.in_layers.3.weight_g", "module.WN.4.in_layers.3.weight_v", "module.WN.4.in_layers.4.bias", "module.WN.4.in_layers.4.weight_g", "module.WN.4.in_layers.4.weight_v", "module.WN.4.in_layers.5.bias", "module.WN.4.in_layers.5.weight_g", "module.WN.4.in_layers.5.weight_v", "module.WN.4.in_layers.6.bias", "module.WN.4.in_layers.6.weight_g", "module.WN.4.in_layers.6.weight_v", "module.WN.4.in_layers.7.bias", "module.WN.4.in_layers.7.weight_g", "module.WN.4.in_layers.7.weight_v", "module.WN.4.res_skip_layers.0.bias", "module.WN.4.res_skip_layers.0.weight_g", "module.WN.4.res_skip_layers.0.weight_v", "module.WN.4.res_skip_layers.1.bias", "module.WN.4.res_skip_layers.1.weight_g", "module.WN.4.res_skip_layers.1.weight_v", "module.WN.4.res_skip_layers.2.bias", "module.WN.4.res_skip_layers.2.weight_g", "module.WN.4.res_skip_layers.2.weight_v", "module.WN.4.res_skip_layers.3.bias", "module.WN.4.res_skip_layers.3.weight_g", "module.WN.4.res_skip_layers.3.weight_v", "module.WN.4.res_skip_layers.4.bias", "module.WN.4.res_skip_layers.4.weight_g", "module.WN.4.res_skip_layers.4.weight_v", "module.WN.4.res_skip_layers.5.bias", "module.WN.4.res_skip_layers.5.weight_g", "module.WN.4.res_skip_layers.5.weight_v", "module.WN.4.res_skip_layers.6.bias", "module.WN.4.res_skip_layers.6.weight_g", "module.WN.4.res_skip_layers.6.weight_v", "module.WN.4.res_skip_layers.7.bias", "module.WN.4.res_skip_layers.7.weight_g", "module.WN.4.res_skip_layers.7.weight_v", "module.WN.4.cond_layers.0.bias", "module.WN.4.cond_layers.0.weight_g", "module.WN.4.cond_layers.0.weight_v", "module.WN.4.cond_layers.1.bias", "module.WN.4.cond_layers.1.weight_g", "module.WN.4.cond_layers.1.weight_v", "module.WN.4.cond_layers.2.bias", "module.WN.4.cond_layers.2.weight_g", "module.WN.4.cond_layers.2.weight_v", "module.WN.4.cond_layers.3.bias", "module.WN.4.cond_layers.3.weight_g", "module.WN.4.cond_layers.3.weight_v", "module.WN.4.cond_layers.4.bias", "module.WN.4.cond_layers.4.weight_g", "module.WN.4.cond_layers.4.weight_v", "module.WN.4.cond_layers.5.bias", "module.WN.4.cond_layers.5.weight_g", "module.WN.4.cond_layers.5.weight_v", "module.WN.4.cond_layers.6.bias", "module.WN.4.cond_layers.6.weight_g", "module.WN.4.cond_layers.6.weight_v", "module.WN.4.cond_layers.7.bias", "module.WN.4.cond_layers.7.weight_g", "module.WN.4.cond_layers.7.weight_v", "module.WN.4.start.bias", "module.WN.4.start.weight_g", "module.WN.4.start.weight_v", "module.WN.4.end.weight", "module.WN.4.end.bias", "module.WN.5.in_layers.0.bias", "module.WN.5.in_layers.0.weight_g", "module.WN.5.in_layers.0.weight_v", "module.WN.5.in_layers.1.bias", "module.WN.5.in_layers.1.weight_g", "module.WN.5.in_layers.1.weight_v", "module.WN.5.in_layers.2.bias", "module.WN.5.in_layers.2.weight_g", "module.WN.5.in_layers.2.weight_v", "module.WN.5.in_layers.3.bias", "module.WN.5.in_layers.3.weight_g", "module.WN.5.in_layers.3.weight_v", "module.WN.5.in_layers.4.bias", "module.WN.5.in_layers.4.weight_g", "module.WN.5.in_layers.4.weight_v", "module.WN.5.in_layers.5.bias", "module.WN.5.in_layers.5.weight_g", "module.WN.5.in_layers.5.weight_v", "module.WN.5.in_layers.6.bias", "module.WN.5.in_layers.6.weight_g", "module.WN.5.in_layers.6.weight_v", "module.WN.5.in_layers.7.bias", "module.WN.5.in_layers.7.weight_g", "module.WN.5.in_layers.7.weight_v", "module.WN.5.res_skip_layers.0.bias", "module.WN.5.res_skip_layers.0.weight_g", "module.WN.5.res_skip_layers.0.weight_v", "module.WN.5.res_skip_layers.1.bias", "module.WN.5.res_skip_layers.1.weight_g", "module.WN.5.res_skip_layers.1.weight_v", "module.WN.5.res_skip_layers.2.bias", "module.WN.5.res_skip_layers.2.weight_g", "module.WN.5.res_skip_layers.2.weight_v", "module.WN.5.res_skip_layers.3.bias", "module.WN.5.res_skip_layers.3.weight_g", "module.WN.5.res_skip_layers.3.weight_v", "module.WN.5.res_skip_layers.4.bias", "module.WN.5.res_skip_layers.4.weight_g", "module.WN.5.res_skip_layers.4.weight_v", "module.WN.5.res_skip_layers.5.bias", "module.WN.5.res_skip_layers.5.weight_g", "module.WN.5.res_skip_layers.5.weight_v", "module.WN.5.res_skip_layers.6.bias", "module.WN.5.res_skip_layers.6.weight_g", "module.WN.5.res_skip_layers.6.weight_v", "module.WN.5.res_skip_layers.7.bias", "module.WN.5.res_skip_layers.7.weight_g", "module.WN.5.res_skip_layers.7.weight_v", "module.WN.5.cond_layers.0.bias", "module.WN.5.cond_layers.0.weight_g", "module.WN.5.cond_layers.0.weight_v", "module.WN.5.cond_layers.1.bias", "module.WN.5.cond_layers.1.weight_g", "module.WN.5.cond_layers.1.weight_v", "module.WN.5.cond_layers.2.bias", "module.WN.5.cond_layers.2.weight_g", "module.WN.5.cond_layers.2.weight_v", "module.WN.5.cond_layers.3.bias", "module.WN.5.cond_layers.3.weight_g", "module.WN.5.cond_layers.3.weight_v", "module.WN.5.cond_layers.4.bias", "module.WN.5.cond_layers.4.weight_g", "module.WN.5.cond_layers.4.weight_v", "module.WN.5.cond_layers.5.bias", "module.WN.5.cond_layers.5.weight_g", "module.WN.5.cond_layers.5.weight_v", "module.WN.5.cond_layers.6.bias", "module.WN.5.cond_layers.6.weight_g", "module.WN.5.cond_layers.6.weight_v", "module.WN.5.cond_layers.7.bias", "module.WN.5.cond_layers.7.weight_g", "module.WN.5.cond_layers.7.weight_v", "module.WN.5.start.bias", "module.WN.5.start.weight_g", "module.WN.5.start.weight_v", "module.WN.5.end.weight", "module.WN.5.end.bias", "module.WN.6.in_layers.0.bias", "module.WN.6.in_layers.0.weight_g", "module.WN.6.in_layers.0.weight_v", "module.WN.6.in_layers.1.bias", "module.WN.6.in_layers.1.weight_g", "module.WN.6.in_layers.1.weight_v", "module.WN.6.in_layers.2.bias", "module.WN.6.in_layers.2.weight_g", "module.WN.6.in_layers.2.weight_v", "module.WN.6.in_layers.3.bias", "module.WN.6.in_layers.3.weight_g", "module.WN.6.in_layers.3.weight_v", "module.WN.6.in_layers.4.bias", "module.WN.6.in_layers.4.weight_g", "module.WN.6.in_layers.4.weight_v", "module.WN.6.in_layers.5.bias", "module.WN.6.in_layers.5.weight_g", "module.WN.6.in_layers.5.weight_v", "module.WN.6.in_layers.6.bias", "module.WN.6.in_layers.6.weight_g", "module.WN.6.in_layers.6.weight_v", "module.WN.6.in_layers.7.bias", "module.WN.6.in_layers.7.weight_g", "module.WN.6.in_layers.7.weight_v", "module.WN.6.res_skip_layers.0.bias", "module.WN.6.res_skip_layers.0.weight_g", "module.WN.6.res_skip_layers.0.weight_v", "module.WN.6.res_skip_layers.1.bias", "module.WN.6.res_skip_layers.1.weight_g", "module.WN.6.res_skip_layers.1.weight_v", "module.WN.6.res_skip_layers.2.bias", "module.WN.6.res_skip_layers.2.weight_g", "module.WN.6.res_skip_layers.2.weight_v", "module.WN.6.res_skip_layers.3.bias", "module.WN.6.res_skip_layers.3.weight_g", "module.WN.6.res_skip_layers.3.weight_v", "module.WN.6.res_skip_layers.4.bias", "module.WN.6.res_skip_layers.4.weight_g", "module.WN.6.res_skip_layers.4.weight_v", "module.WN.6.res_skip_layers.5.bias", "module.WN.6.res_skip_layers.5.weight_g", "module.WN.6.res_skip_layers.5.weight_v", "module.WN.6.res_skip_layers.6.bias", "module.WN.6.res_skip_layers.6.weight_g", "module.WN.6.res_skip_layers.6.weight_v", "module.WN.6.res_skip_layers.7.bias", "module.WN.6.res_skip_layers.7.weight_g", "module.WN.6.res_skip_layers.7.weight_v", "module.WN.6.cond_layers.0.bias", "module.WN.6.cond_layers.0.weight_g", "module.WN.6.cond_layers.0.weight_v", "module.WN.6.cond_layers.1.bias", "module.WN.6.cond_layers.1.weight_g", "module.WN.6.cond_layers.1.weight_v", "module.WN.6.cond_layers.2.bias", "module.WN.6.cond_layers.2.weight_g", "module.WN.6.cond_layers.2.weight_v", "module.WN.6.cond_layers.3.bias", "module.WN.6.cond_layers.3.weight_g", "module.WN.6.cond_layers.3.weight_v", "module.WN.6.cond_layers.4.bias", "module.WN.6.cond_layers.4.weight_g", "module.WN.6.cond_layers.4.weight_v", "module.WN.6.cond_layers.5.bias", "module.WN.6.cond_layers.5.weight_g", "module.WN.6.cond_layers.5.weight_v", "module.WN.6.cond_layers.6.bias", "module.WN.6.cond_layers.6.weight_g", "module.WN.6.cond_layers.6.weight_v", "module.WN.6.cond_layers.7.bias", "module.WN.6.cond_layers.7.weight_g", "module.WN.6.cond_layers.7.weight_v", "module.WN.6.start.bias", "module.WN.6.start.weight_g", "module.WN.6.start.weight_v", "module.WN.6.end.weight", "module.WN.6.end.bias", "module.WN.7.in_layers.0.bias", "module.WN.7.in_layers.0.weight_g", "module.WN.7.in_layers.0.weight_v", "module.WN.7.in_layers.1.bias", "module.WN.7.in_layers.1.weight_g", "module.WN.7.in_layers.1.weight_v", "module.WN.7.in_layers.2.bias", "module.WN.7.in_layers.2.weight_g", "module.WN.7.in_layers.2.weight_v", "module.WN.7.in_layers.3.bias", "module.WN.7.in_layers.3.weight_g", "module.WN.7.in_layers.3.weight_v", "module.WN.7.in_layers.4.bias", "module.WN.7.in_layers.4.weight_g", "module.WN.7.in_layers.4.weight_v", "module.WN.7.in_layers.5.bias", "module.WN.7.in_layers.5.weight_g", "module.WN.7.in_layers.5.weight_v", "module.WN.7.in_layers.6.bias", "module.WN.7.in_layers.6.weight_g", "module.WN.7.in_layers.6.weight_v", "module.WN.7.in_layers.7.bias", "module.WN.7.in_layers.7.weight_g", "module.WN.7.in_layers.7.weight_v", "module.WN.7.res_skip_layers.0.bias", "module.WN.7.res_skip_layers.0.weight_g", "module.WN.7.res_skip_layers.0.weight_v", "module.WN.7.res_skip_layers.1.bias", "module.WN.7.res_skip_layers.1.weight_g", "module.WN.7.res_skip_layers.1.weight_v", "module.WN.7.res_skip_layers.2.bias", "module.WN.7.res_skip_layers.2.weight_g", "module.WN.7.res_skip_layers.2.weight_v", "module.WN.7.res_skip_layers.3.bias", "module.WN.7.res_skip_layers.3.weight_g", "module.WN.7.res_skip_layers.3.weight_v", "module.WN.7.res_skip_layers.4.bias", "module.WN.7.res_skip_layers.4.weight_g", "module.WN.7.res_skip_layers.4.weight_v", "module.WN.7.res_skip_layers.5.bias", "module.WN.7.res_skip_layers.5.weight_g", "module.WN.7.res_skip_layers.5.weight_v", "module.WN.7.res_skip_layers.6.bias", "module.WN.7.res_skip_layers.6.weight_g", "module.WN.7.res_skip_layers.6.weight_v", "module.WN.7.res_skip_layers.7.bias", "module.WN.7.res_skip_layers.7.weight_g", "module.WN.7.res_skip_layers.7.weight_v", "module.WN.7.cond_layers.0.bias", "module.WN.7.cond_layers.0.weight_g", "module.WN.7.cond_layers.0.weight_v", "module.WN.7.cond_layers.1.bias", "module.WN.7.cond_layers.1.weight_g", "module.WN.7.cond_layers.1.weight_v", "module.WN.7.cond_layers.2.bias", "module.WN.7.cond_layers.2.weight_g", "module.WN.7.cond_layers.2.weight_v", "module.WN.7.cond_layers.3.bias", "module.WN.7.cond_layers.3.weight_g", "module.WN.7.cond_layers.3.weight_v", "module.WN.7.cond_layers.4.bias", "module.WN.7.cond_layers.4.weight_g", "module.WN.7.cond_layers.4.weight_v", "module.WN.7.cond_layers.5.bias", "module.WN.7.cond_layers.5.weight_g", "module.WN.7.cond_layers.5.weight_v", "module.WN.7.cond_layers.6.bias", "module.WN.7.cond_layers.6.weight_g", "module.WN.7.cond_layers.6.weight_v", "module.WN.7.cond_layers.7.bias", "module.WN.7.cond_layers.7.weight_g", "module.WN.7.cond_layers.7.weight_v", "module.WN.7.start.bias", "module.WN.7.start.weight_g", "module.WN.7.start.weight_v", "module.WN.7.end.weight", "module.WN.7.end.bias", "module.WN.8.in_layers.0.bias", "module.WN.8.in_layers.0.weight_g", "module.WN.8.in_layers.0.weight_v", "module.WN.8.in_layers.1.bias", "module.WN.8.in_layers.1.weight_g", "module.WN.8.in_layers.1.weight_v", "module.WN.8.in_layers.2.bias", "module.WN.8.in_layers.2.weight_g", "module.WN.8.in_layers.2.weight_v", "module.WN.8.in_layers.3.bias", "module.WN.8.in_layers.3.weight_g", "module.WN.8.in_layers.3.weight_v", "module.WN.8.in_layers.4.bias", "module.WN.8.in_layers.4.weight_g", "module.WN.8.in_layers.4.weight_v", "module.WN.8.in_layers.5.bias", "module.WN.8.in_layers.5.weight_g", "module.WN.8.in_layers.5.weight_v", "module.WN.8.in_layers.6.bias", "module.WN.8.in_layers.6.weight_g", "module.WN.8.in_layers.6.weight_v", "module.WN.8.in_layers.7.bias", "module.WN.8.in_layers.7.weight_g", "module.WN.8.in_layers.7.weight_v", "module.WN.8.res_skip_layers.0.bias", "module.WN.8.res_skip_layers.0.weight_g", "module.WN.8.res_skip_layers.0.weight_v", "module.WN.8.res_skip_layers.1.bias", "module.WN.8.res_skip_layers.1.weight_g", "module.WN.8.res_skip_layers.1.weight_v", "module.WN.8.res_skip_layers.2.bias", "module.WN.8.res_skip_layers.2.weight_g", "module.WN.8.res_skip_layers.2.weight_v", "module.WN.8.res_skip_layers.3.bias", "module.WN.8.res_skip_layers.3.weight_g", "module.WN.8.res_skip_layers.3.weight_v", "module.WN.8.res_skip_layers.4.bias", "module.WN.8.res_skip_layers.4.weight_g", "module.WN.8.res_skip_layers.4.weight_v", "module.WN.8.res_skip_layers.5.bias", "module.WN.8.res_skip_layers.5.weight_g", "module.WN.8.res_skip_layers.5.weight_v", "module.WN.8.res_skip_layers.6.bias", "module.WN.8.res_skip_layers.6.weight_g", "module.WN.8.res_skip_layers.6.weight_v", "module.WN.8.res_skip_layers.7.bias", "module.WN.8.res_skip_layers.7.weight_g", "module.WN.8.res_skip_layers.7.weight_v", "module.WN.8.cond_layers.0.bias", "module.WN.8.cond_layers.0.weight_g", "module.WN.8.cond_layers.0.weight_v", "module.WN.8.cond_layers.1.bias", "module.WN.8.cond_layers.1.weight_g", "module.WN.8.cond_layers.1.weight_v", "module.WN.8.cond_layers.2.bias", "module.WN.8.cond_layers.2.weight_g", "module.WN.8.cond_layers.2.weight_v", "module.WN.8.cond_layers.3.bias", "module.WN.8.cond_layers.3.weight_g", "module.WN.8.cond_layers.3.weight_v", "module.WN.8.cond_layers.4.bias", "module.WN.8.cond_layers.4.weight_g", "module.WN.8.cond_layers.4.weight_v", "module.WN.8.cond_layers.5.bias", "module.WN.8.cond_layers.5.weight_g", "module.WN.8.cond_layers.5.weight_v", "module.WN.8.cond_layers.6.bias", "module.WN.8.cond_layers.6.weight_g", "module.WN.8.cond_layers.6.weight_v", "module.WN.8.cond_layers.7.bias", "module.WN.8.cond_layers.7.weight_g", "module.WN.8.cond_layers.7.weight_v", "module.WN.8.start.bias", "module.WN.8.start.weight_g", "module.WN.8.start.weight_v", "module.WN.8.end.weight", "module.WN.8.end.bias", "module.WN.9.in_layers.0.bias", "module.WN.9.in_layers.0.weight_g", "module.WN.9.in_layers.0.weight_v", "module.WN.9.in_layers.1.bias", "module.WN.9.in_layers.1.weight_g", "module.WN.9.in_layers.1.weight_v", "module.WN.9.in_layers.2.bias", "module.WN.9.in_layers.2.weight_g", "module.WN.9.in_layers.2.weight_v", "module.WN.9.in_layers.3.bias", "module.WN.9.in_layers.3.weight_g", "module.WN.9.in_layers.3.weight_v", "module.WN.9.in_layers.4.bias", "module.WN.9.in_layers.4.weight_g", "module.WN.9.in_layers.4.weight_v", "module.WN.9.in_layers.5.bias", "module.WN.9.in_layers.5.weight_g", "module.WN.9.in_layers.5.weight_v", "module.WN.9.in_layers.6.bias", "module.WN.9.in_layers.6.weight_g", "module.WN.9.in_layers.6.weight_v", "module.WN.9.in_layers.7.bias", "module.WN.9.in_layers.7.weight_g", "module.WN.9.in_layers.7.weight_v", "module.WN.9.res_skip_layers.0.bias", "module.WN.9.res_skip_layers.0.weight_g", "module.WN.9.res_skip_layers.0.weight_v", "module.WN.9.res_skip_layers.1.bias", "module.WN.9.res_skip_layers.1.weight_g", "module.WN.9.res_skip_layers.1.weight_v", "module.WN.9.res_skip_layers.2.bias", "module.WN.9.res_skip_layers.2.weight_g", "module.WN.9.res_skip_layers.2.weight_v", "module.WN.9.res_skip_layers.3.bias", "module.WN.9.res_skip_layers.3.weight_g", "module.WN.9.res_skip_layers.3.weight_v", "module.WN.9.res_skip_layers.4.bias", "module.WN.9.res_skip_layers.4.weight_g", "module.WN.9.res_skip_layers.4.weight_v", "module.WN.9.res_skip_layers.5.bias", "module.WN.9.res_skip_layers.5.weight_g", "module.WN.9.res_skip_layers.5.weight_v", "module.WN.9.res_skip_layers.6.bias", "module.WN.9.res_skip_layers.6.weight_g", "module.WN.9.res_skip_layers.6.weight_v", "module.WN.9.res_skip_layers.7.bias", "module.WN.9.res_skip_layers.7.weight_g", "module.WN.9.res_skip_layers.7.weight_v", "module.WN.9.cond_layers.0.bias", "module.WN.9.cond_layers.0.weight_g", "module.WN.9.cond_layers.0.weight_v", "module.WN.9.cond_layers.1.bias", "module.WN.9.cond_layers.1.weight_g", "module.WN.9.cond_layers.1.weight_v", "module.WN.9.cond_layers.2.bias", "module.WN.9.cond_layers.2.weight_g", "module.WN.9.cond_layers.2.weight_v", "module.WN.9.cond_layers.3.bias", "module.WN.9.cond_layers.3.weight_g", "module.WN.9.cond_layers.3.weight_v", "module.WN.9.cond_layers.4.bias", "module.WN.9.cond_layers.4.weight_g", "module.WN.9.cond_layers.4.weight_v", "module.WN.9.cond_layers.5.bias", "module.WN.9.cond_layers.5.weight_g", "module.WN.9.cond_layers.5.weight_v", "module.WN.9.cond_layers.6.bias", "module.WN.9.cond_layers.6.weight_g", "module.WN.9.cond_layers.6.weight_v", "module.WN.9.cond_layers.7.bias", "module.WN.9.cond_layers.7.weight_g", "module.WN.9.cond_layers.7.weight_v", "module.WN.9.start.bias", "module.WN.9.start.weight_g", "module.WN.9.start.weight_v", "module.WN.9.end.weight", "module.WN.9.end.bias", "module.WN.10.in_layers.0.bias", "module.WN.10.in_layers.0.weight_g", "module.WN.10.in_layers.0.weight_v", "module.WN.10.in_layers.1.bias", "module.WN.10.in_layers.1.weight_g", "module.WN.10.in_layers.1.weight_v", "module.WN.10.in_layers.2.bias", "module.WN.10.in_layers.2.weight_g", "module.WN.10.in_layers.2.weight_v", "module.WN.10.in_layers.3.bias", "module.WN.10.in_layers.3.weight_g", "module.WN.10.in_layers.3.weight_v", "module.WN.10.in_layers.4.bias", "module.WN.10.in_layers.4.weight_g", "module.WN.10.in_layers.4.weight_v", "module.WN.10.in_layers.5.bias", "module.WN.10.in_layers.5.weight_g", "module.WN.10.in_layers.5.weight_v", "module.WN.10.in_layers.6.bias", "module.WN.10.in_layers.6.weight_g", "module.WN.10.in_layers.6.weight_v", "module.WN.10.in_layers.7.bias", "module.WN.10.in_layers.7.weight_g", "module.WN.10.in_layers.7.weight_v", "module.WN.10.res_skip_layers.0.bias", "module.WN.10.res_skip_layers.0.weight_g", "module.WN.10.res_skip_layers.0.weight_v", "module.WN.10.res_skip_layers.1.bias", "module.WN.10.res_skip_layers.1.weight_g", "module.WN.10.res_skip_layers.1.weight_v", "module.WN.10.res_skip_layers.2.bias", "module.WN.10.res_skip_layers.2.weight_g", "module.WN.10.res_skip_layers.2.weight_v", "module.WN.10.res_skip_layers.3.bias", "module.WN.10.res_skip_layers.3.weight_g", "module.WN.10.res_skip_layers.3.weight_v", "module.WN.10.res_skip_layers.4.bias", "module.WN.10.res_skip_layers.4.weight_g", "module.WN.10.res_skip_layers.4.weight_v", "module.WN.10.res_skip_layers.5.bias", "module.WN.10.res_skip_layers.5.weight_g", "module.WN.10.res_skip_layers.5.weight_v", "module.WN.10.res_skip_layers.6.bias", "module.WN.10.res_skip_layers.6.weight_g", "module.WN.10.res_skip_layers.6.weight_v", "module.WN.10.res_skip_layers.7.bias", "module.WN.10.res_skip_layers.7.weight_g", "module.WN.10.res_skip_layers.7.weight_v", "module.WN.10.cond_layers.0.bias", "module.WN.10.cond_layers.0.weight_g", "module.WN.10.cond_layers.0.weight_v", "module.WN.10.cond_layers.1.bias", "module.WN.10.cond_layers.1.weight_g", "module.WN.10.cond_layers.1.weight_v", "module.WN.10.cond_layers.2.bias", "module.WN.10.cond_layers.2.weight_g", "module.WN.10.cond_layers.2.weight_v", "module.WN.10.cond_layers.3.bias", "module.WN.10.cond_layers.3.weight_g", "module.WN.10.cond_layers.3.weight_v", "module.WN.10.cond_layers.4.bias", "module.WN.10.cond_layers.4.weight_g", "module.WN.10.cond_layers.4.weight_v", "module.WN.10.cond_layers.5.bias", "module.WN.10.cond_layers.5.weight_g", "module.WN.10.cond_layers.5.weight_v", "module.WN.10.cond_layers.6.bias", "module.WN.10.cond_layers.6.weight_g", "module.WN.10.cond_layers.6.weight_v", "module.WN.10.cond_layers.7.bias", "module.WN.10.cond_layers.7.weight_g", "module.WN.10.cond_layers.7.weight_v", "module.WN.10.start.bias", "module.WN.10.start.weight_g", "module.WN.10.start.weight_v", "module.WN.10.end.weight", "module.WN.10.end.bias", "module.WN.11.in_layers.0.bias", "module.WN.11.in_layers.0.weight_g", "module.WN.11.in_layers.0.weight_v", "module.WN.11.in_layers.1.bias", "module.WN.11.in_layers.1.weight_g", "module.WN.11.in_layers.1.weight_v", "module.WN.11.in_layers.2.bias", "module.WN.11.in_layers.2.weight_g", "module.WN.11.in_layers.2.weight_v", "module.WN.11.in_layers.3.bias", "module.WN.11.in_layers.3.weight_g", "module.WN.11.in_layers.3.weight_v", "module.WN.11.in_layers.4.bias", "module.WN.11.in_layers.4.weight_g", "module.WN.11.in_layers.4.weight_v", "module.WN.11.in_layers.5.bias", "module.WN.11.in_layers.5.weight_g", "module.WN.11.in_layers.5.weight_v", "module.WN.11.in_layers.6.bias", "module.WN.11.in_layers.6.weight_g", "module.WN.11.in_layers.6.weight_v", "module.WN.11.in_layers.7.bias", "module.WN.11.in_layers.7.weight_g", "module.WN.11.in_layers.7.weight_v", "module.WN.11.res_skip_layers.0.bias", "module.WN.11.res_skip_layers.0.weight_g", "module.WN.11.res_skip_layers.0.weight_v", "module.WN.11.res_skip_layers.1.bias", "module.WN.11.res_skip_layers.1.weight_g", "module.WN.11.res_skip_layers.1.weight_v", "module.WN.11.res_skip_layers.2.bias", "module.WN.11.res_skip_layers.2.weight_g", "module.WN.11.res_skip_layers.2.weight_v", "module.WN.11.res_skip_layers.3.bias", "module.WN.11.res_skip_layers.3.weight_g", "module.WN.11.res_skip_layers.3.weight_v", "module.WN.11.res_skip_layers.4.bias", "module.WN.11.res_skip_layers.4.weight_g", "module.WN.11.res_skip_layers.4.weight_v", "module.WN.11.res_skip_layers.5.bias", "module.WN.11.res_skip_layers.5.weight_g", "module.WN.11.res_skip_layers.5.weight_v", "module.WN.11.res_skip_layers.6.bias", "module.WN.11.res_skip_layers.6.weight_g", "module.WN.11.res_skip_layers.6.weight_v", "module.WN.11.res_skip_layers.7.bias", "module.WN.11.res_skip_layers.7.weight_g", "module.WN.11.res_skip_layers.7.weight_v", "module.WN.11.cond_layers.0.bias", "module.WN.11.cond_layers.0.weight_g", "module.WN.11.cond_layers.0.weight_v", "module.WN.11.cond_layers.1.bias", "module.WN.11.cond_layers.1.weight_g", "module.WN.11.cond_layers.1.weight_v", "module.WN.11.cond_layers.2.bias", "module.WN.11.cond_layers.2.weight_g", "module.WN.11.cond_layers.2.weight_v", "module.WN.11.cond_layers.3.bias", "module.WN.11.cond_layers.3.weight_g", "module.WN.11.cond_layers.3.weight_v", "module.WN.11.cond_layers.4.bias", "module.WN.11.cond_layers.4.weight_g", "module.WN.11.cond_layers.4.weight_v", "module.WN.11.cond_layers.5.bias", "module.WN.11.cond_layers.5.weight_g", "module.WN.11.cond_layers.5.weight_v", "module.WN.11.cond_layers.6.bias", "module.WN.11.cond_layers.6.weight_g", "module.WN.11.cond_layers.6.weight_v", "module.WN.11.cond_layers.7.bias", "module.WN.11.cond_layers.7.weight_g", "module.WN.11.cond_layers.7.weight_v", "module.WN.11.start.bias", "module.WN.11.start.weight_g", "module.WN.11.start.weight_v", "module.WN.11.end.weight", "module.WN.11.end.bias", "module.convinv.0.conv.weight", "module.convinv.1.conv.weight", "module.convinv.2.conv.weight", "module.convinv.3.conv.weight", "module.convinv.4.conv.weight", "module.convinv.5.conv.weight", "module.convinv.6.conv.weight", "module.convinv.7.conv.weight", "module.convinv.8.conv.weight", "module.convinv.9.conv.weight", "module.convinv.10.conv.weight", "module.convinv.11.conv.weight".
CookiePPP commented 3 years ago

@MuruganR96 https://github.com/NVIDIA/DeepLearningExamples/issues/319#issuecomment-595454132 right above you may be useful.

Specifically the

    model.load_state_dict(checkpoint['state_dict'])
    optimizer.load_state_dict(checkpoint['optimizer'])

to

    model.load_state_dict({k.replace('module.',''): v for k, v in checkpoint['state_dict'].items()})
    if 'optimizer' in checkpoint: optimizer.load_state_dict(checkpoint['optimizer'])

lines.

MuruganR96 commented 3 years ago

@CookiePPP sir, I am really happy. it is working fine. thank you so much sir.

@CookiePPP sir, i did above you mentioned changes and @GrzegorzKarchNV sir mentioned parameters also i updated

command i tried,

python train.py -m WaveGlow -o ./ -lr 1e-4 --epochs 14500 -bs 10 --segment-length 16000 \
--weight-decay 0 --grad-clip-thresh 65504.0 --cudnn-enabled --cudnn-benchmark --log-file nvlog.json \
--training-files filelists/hindi_audio_text_train_filelist.txt \
--amp --validation-files filelists/hindi_audio_text_val_filelist.txt \
--wn-channels 256 --checkpoint-path backup/waveglow_1076430_14000_amp

Thank you so much @CookiePPP sir, @GrzegorzKarchNV sir. :)