Edresson / YourTTS

YourTTS: Towards Zero-Shot Multi-Speaker TTS and Zero-Shot Voice Conversion for everyone
Other
877 stars 77 forks source link

Finetuning released model ( Exp 4) for new language #32

Open PS-AI opened 1 year ago

PS-AI commented 1 year ago

Hi,

Thanks for your amazing contribution.

I am trying to finetune the model of Exp 4 [https://drive.google.com/drive/folders/15G-QS5tYQPkqiXfAdialJjmuqZV0azQV] for a new language - Malayalam with the IndicTTS dataset following steps mentioned in Issue #12 and Issue #8 threads.

I gave the path of best_model_pth.tar as the RESTORE_PATH in the training argument.

When running train_tts.py with config.json, I get these messages in the logs and the model is being restored from step 0( i.e. being trained afresh rather than being finetuned).

Could you let me know what the problem is and how I could fix it ?

Using CUDA: True Number of GPUs: 1 Restoring from best_model_latest.pth.tar ... Restoring Model... Partial model initialization... | > Layer missing in the model definition: speaker_encoder.conv1.weight | > Layer missing in the model definition: speaker_encoder.conv1.bias | > Layer missing in the model definition: speaker_encoder.bn1.weight | > Layer missing in the model definition: speaker_encoder.bn1.bias | > Layer missing in the model definition: speaker_encoder.bn1.running_mean | > Layer missing in the model definition: speaker_encoder.bn1.running_var | > Layer missing in the model definition: speaker_encoder.bn1.num_batches_tracked | > Layer missing in the model definition: speaker_encoder.layer1.0.conv1.weight | > Layer missing in the model definition: speaker_encoder.layer1.0.bn1.weight | > Layer missing in the model definition: speaker_encoder.layer1.0.bn1.bias | > Layer missing in the model definition: speaker_encoder.layer1.0.bn1.running_mean | > Layer missing in the model definition: speaker_encoder.layer1.0.bn1.running_var | > Layer missing in the model definition: speaker_encoder.layer1.0.bn1.num_batches_tracked | > Layer missing in the model definition: speaker_encoder.layer1.0.conv2.weight | > Layer missing in the model definition: speaker_encoder.layer1.0.bn2.weight | > Layer missing in the model definition: speaker_encoder.layer1.0.bn2.bias | > Layer missing in the model definition: speaker_encoder.layer1.0.bn2.running_mean | > Layer missing in the model definition: speaker_encoder.layer1.0.bn2.running_var | > Layer missing in the model definition: speaker_encoder.layer1.0.bn2.num_batches_tracked | > Layer missing in the model definition: speaker_encoder.layer1.0.se.fc.0.weight | > Layer missing in the model definition: speaker_encoder.layer1.0.se.fc.0.bias | > Layer missing in the model definition: speaker_encoder.layer1.0.se.fc.2.weight | > Layer missing in the model definition: speaker_encoder.layer1.0.se.fc.2.bias | > Layer missing in the model definition: speaker_encoder.layer1.1.conv1.weight | > Layer missing in the model definition: speaker_encoder.layer1.1.bn1.weight | > Layer missing in the model definition: speaker_encoder.layer1.1.bn1.bias | > Layer missing in the model definition: speaker_encoder.layer1.1.bn1.running_mean | > Layer missing in the model definition: speaker_encoder.layer1.1.bn1.running_var | > Layer missing in the model definition: speaker_encoder.layer1.1.bn1.num_batches_tracked | > Layer missing in the model definition: speaker_encoder.layer1.1.conv2.weight | > Layer missing in the model definition: speaker_encoder.layer1.1.bn2.weight | > Layer missing in the model definition: speaker_encoder.layer1.1.bn2.bias | > Layer missing in the model definition: speaker_encoder.layer1.1.bn2.running_mean | > Layer missing in the model definition: speaker_encoder.layer1.1.bn2.running_var | > Layer missing in the model definition: speaker_encoder.layer1.1.bn2.num_batches_tracked | > Layer missing in the model definition: speaker_encoder.layer1.1.se.fc.0.weight | > Layer missing in the model definition: speaker_encoder.layer1.1.se.fc.0.bias | > Layer missing in the model definition: speaker_encoder.layer1.1.se.fc.2.weight | > Layer missing in the model definition: speaker_encoder.layer1.1.se.fc.2.bias | > Layer missing in the model definition: speaker_encoder.layer1.2.conv1.weight | > Layer missing in the model definition: speaker_encoder.layer1.2.bn1.weight | > Layer missing in the model definition: speaker_encoder.layer1.2.bn1.bias | > Layer missing in the model definition: speaker_encoder.layer1.2.bn1.running_mean | > Layer missing in the model definition: speaker_encoder.layer1.2.bn1.running_var | > Layer missing in the model definition: speaker_encoder.layer1.2.bn1.num_batches_tracked | > Layer missing in the model definition: speaker_encoder.layer1.2.conv2.weight | > Layer missing in the model definition: speaker_encoder.layer1.2.bn2.weight | > Layer missing in the model definition: speaker_encoder.layer1.2.bn2.bias | > Layer missing in the model definition: speaker_encoder.layer1.2.bn2.running_mean | > Layer missing in the model definition: speaker_encoder.layer1.2.bn2.running_var | > Layer missing in the model definition: speaker_encoder.layer1.2.bn2.num_batches_tracked | > Layer missing in the model definition: speaker_encoder.layer1.2.se.fc.0.weight | > Layer missing in the model definition: speaker_encoder.layer1.2.se.fc.0.bias | > Layer missing in the model definition: speaker_encoder.layer1.2.se.fc.2.weight | > Layer missing in the model definition: speaker_encoder.layer1.2.se.fc.2.bias | > Layer missing in the model definition: speaker_encoder.layer2.0.conv1.weight | > Layer missing in the model definition: speaker_encoder.layer2.0.bn1.weight | > Layer missing in the model definition: speaker_encoder.layer2.0.bn1.bias | > Layer missing in the model definition: speaker_encoder.layer2.0.bn1.running_mean | > Layer missing in the model definition: speaker_encoder.layer2.0.bn1.running_var | > Layer missing in the model definition: speaker_encoder.layer2.0.bn1.num_batches_tracked | > Layer missing in the model definition: speaker_encoder.layer2.0.conv2.weight | > Layer missing in the model definition: speaker_encoder.layer2.0.bn2.weight | > Layer missing in the model definition: speaker_encoder.layer2.0.bn2.bias | > Layer missing in the model definition: speaker_encoder.layer2.0.bn2.running_mean | > Layer missing in the model definition: speaker_encoder.layer2.0.bn2.running_var | > Layer missing in the model definition: speaker_encoder.layer2.0.bn2.num_batches_tracked | > Layer missing in the model definition: speaker_encoder.layer2.0.se.fc.0.weight | > Layer missing in the model definition: speaker_encoder.layer2.0.se.fc.0.bias | > Layer missing in the model definition: speaker_encoder.layer2.0.se.fc.2.weight | > Layer missing in the model definition: speaker_encoder.layer2.0.se.fc.2.bias | > Layer missing in the model definition: speaker_encoder.layer2.0.downsample.0.weight | > Layer missing in the model definition: speaker_encoder.layer2.0.downsample.1.weight | > Layer missing in the model definition: speaker_encoder.layer2.0.downsample.1.bias | > Layer missing in the model definition: speaker_encoder.layer2.0.downsample.1.running_mean | > Layer missing in the model definition: speaker_encoder.layer2.0.downsample.1.running_var | > Layer missing in the model definition: speaker_encoder.layer2.0.downsample.1.num_batches_tracked | > Layer missing in the model definition: speaker_encoder.layer2.1.conv1.weight | > Layer missing in the model definition: speaker_encoder.layer2.1.bn1.weight | > Layer missing in the model definition: speaker_encoder.layer2.1.bn1.bias | > Layer missing in the model definition: speaker_encoder.layer2.1.bn1.running_mean | > Layer missing in the model definition: speaker_encoder.layer2.1.bn1.running_var | > Layer missing in the model definition: speaker_encoder.layer2.1.bn1.num_batches_tracked | > Layer missing in the model definition: speaker_encoder.layer2.1.conv2.weight | > Layer missing in the model definition: speaker_encoder.layer2.1.bn2.weight | > Layer missing in the model definition: speaker_encoder.layer2.1.bn2.bias | > Layer missing in the model definition: speaker_encoder.layer2.1.bn2.running_mean | > Layer missing in the model definition: speaker_encoder.layer2.1.bn2.running_var | > Layer missing in the model definition: speaker_encoder.layer2.1.bn2.num_batches_tracked | > Layer missing in the model definition: speaker_encoder.layer2.1.se.fc.0.weight | > Layer missing in the model definition: speaker_encoder.layer2.1.se.fc.0.bias | > Layer missing in the model definition: speaker_encoder.layer2.1.se.fc.2.weight | > Layer missing in the model definition: speaker_encoder.layer2.1.se.fc.2.bias | > Layer missing in the model definition: speaker_encoder.layer2.2.conv1.weight | > Layer missing in the model definition: speaker_encoder.layer2.2.bn1.weight | > Layer missing in the model definition: speaker_encoder.layer2.2.bn1.bias | > Layer missing in the model definition: speaker_encoder.layer2.2.bn1.running_mean | > Layer missing in the model definition: speaker_encoder.layer2.2.bn1.running_var | > Layer missing in the model definition: speaker_encoder.layer2.2.bn1.num_batches_tracked | > Layer missing in the model definition: speaker_encoder.layer2.2.conv2.weight | > Layer missing in the model definition: speaker_encoder.layer2.2.bn2.weight | > Layer missing in the model definition: speaker_encoder.layer2.2.bn2.bias | > Layer missing in the model definition: speaker_encoder.layer2.2.bn2.running_mean | > Layer missing in the model definition: speaker_encoder.layer2.2.bn2.running_var | > Layer missing in the model definition: speaker_encoder.layer2.2.bn2.num_batches_tracked | > Layer missing in the model definition: speaker_encoder.layer2.2.se.fc.0.weight | > Layer missing in the model definition: speaker_encoder.layer2.2.se.fc.0.bias | > Layer missing in the model definition: speaker_encoder.layer2.2.se.fc.2.weight | > Layer missing in the model definition: speaker_encoder.layer2.2.se.fc.2.bias | > Layer missing in the model definition: speaker_encoder.layer2.3.conv1.weight | > Layer missing in the model definition: speaker_encoder.layer2.3.bn1.weight | > Layer missing in the model definition: speaker_encoder.layer2.3.bn1.bias | > Layer missing in the model definition: speaker_encoder.layer2.3.bn1.running_mean | > Layer missing in the model definition: speaker_encoder.layer2.3.bn1.running_var | > Layer missing in the model definition: speaker_encoder.layer2.3.bn1.num_batches_tracked | > Layer missing in the model definition: speaker_encoder.layer2.3.conv2.weight | > Layer missing in the model definition: speaker_encoder.layer2.3.bn2.weight | > Layer missing in the model definition: speaker_encoder.layer2.3.bn2.bias | > Layer missing in the model definition: speaker_encoder.layer2.3.bn2.running_mean | > Layer missing in the model definition: speaker_encoder.layer2.3.bn2.running_var | > Layer missing in the model definition: speaker_encoder.layer2.3.bn2.num_batches_tracked | > Layer missing in the model definition: speaker_encoder.layer2.3.se.fc.0.weight | > Layer missing in the model definition: speaker_encoder.layer2.3.se.fc.0.bias | > Layer missing in the model definition: speaker_encoder.layer2.3.se.fc.2.weight | > Layer missing in the model definition: speaker_encoder.layer2.3.se.fc.2.bias | > Layer missing in the model definition: speaker_encoder.layer3.0.conv1.weight | > Layer missing in the model definition: speaker_encoder.layer3.0.bn1.weight | > Layer missing in the model definition: speaker_encoder.layer3.0.bn1.bias | > Layer missing in the model definition: speaker_encoder.layer3.0.bn1.running_mean | > Layer missing in the model definition: speaker_encoder.layer3.0.bn1.running_var | > Layer missing in the model definition: speaker_encoder.layer3.0.bn1.num_batches_tracked | > Layer missing in the model definition: speaker_encoder.layer3.0.conv2.weight | > Layer missing in the model definition: speaker_encoder.layer3.0.bn2.weight | > Layer missing in the model definition: speaker_encoder.layer3.0.bn2.bias | > Layer missing in the model definition: speaker_encoder.layer3.0.bn2.running_mean | > Layer missing in the model definition: speaker_encoder.layer3.0.bn2.running_var | > Layer missing in the model definition: speaker_encoder.layer3.0.bn2.num_batches_tracked | > Layer missing in the model definition: speaker_encoder.layer3.0.se.fc.0.weight | > Layer missing in the model definition: speaker_encoder.layer3.0.se.fc.0.bias | > Layer missing in the model definition: speaker_encoder.layer3.0.se.fc.2.weight | > Layer missing in the model definition: speaker_encoder.layer3.0.se.fc.2.bias | > Layer missing in the model definition: speaker_encoder.layer3.0.downsample.0.weight | > Layer missing in the model definition: speaker_encoder.layer3.0.downsample.1.weight | > Layer missing in the model definition: speaker_encoder.layer3.0.downsample.1.bias | > Layer missing in the model definition: speaker_encoder.layer3.0.downsample.1.running_mean | > Layer missing in the model definition: speaker_encoder.layer3.0.downsample.1.running_var | > Layer missing in the model definition: speaker_encoder.layer3.0.downsample.1.num_batches_tracked | > Layer missing in the model definition: speaker_encoder.layer3.1.conv1.weight | > Layer missing in the model definition: speaker_encoder.layer3.1.bn1.weight | > Layer missing in the model definition: speaker_encoder.layer3.1.bn1.bias | > Layer missing in the model definition: speaker_encoder.layer3.1.bn1.running_mean | > Layer missing in the model definition: speaker_encoder.layer3.1.bn1.running_var | > Layer missing in the model definition: speaker_encoder.layer3.1.bn1.num_batches_tracked | > Layer missing in the model definition: speaker_encoder.layer3.1.conv2.weight | > Layer missing in the model definition: speaker_encoder.layer3.1.bn2.weight | > Layer missing in the model definition: speaker_encoder.layer3.1.bn2.bias | > Layer missing in the model definition: speaker_encoder.layer3.1.bn2.running_mean | > Layer missing in the model definition: speaker_encoder.layer3.1.bn2.running_var | > Layer missing in the model definition: speaker_encoder.layer3.1.bn2.num_batches_tracked | > Layer missing in the model definition: speaker_encoder.layer3.1.se.fc.0.weight | > Layer missing in the model definition: speaker_encoder.layer3.1.se.fc.0.bias | > Layer missing in the model definition: speaker_encoder.layer3.1.se.fc.2.weight | > Layer missing in the model definition: speaker_encoder.layer3.1.se.fc.2.bias | > Layer missing in the model definition: speaker_encoder.layer3.2.conv1.weight | > Layer missing in the model definition: speaker_encoder.layer3.2.bn1.weight | > Layer missing in the model definition: speaker_encoder.layer3.2.bn1.bias | > Layer missing in the model definition: speaker_encoder.layer3.2.bn1.running_mean | > Layer missing in the model definition: speaker_encoder.layer3.2.bn1.running_var | > Layer missing in the model definition: speaker_encoder.layer3.2.bn1.num_batches_tracked | > Layer missing in the model definition: speaker_encoder.layer3.2.conv2.weight | > Layer missing in the model definition: speaker_encoder.layer3.2.bn2.weight | > Layer missing in the model definition: speaker_encoder.layer3.2.bn2.bias | > Layer missing in the model definition: speaker_encoder.layer3.2.bn2.running_mean | > Layer missing in the model definition: speaker_encoder.layer3.2.bn2.running_var | > Layer missing in the model definition: speaker_encoder.layer3.2.bn2.num_batches_tracked | > Layer missing in the model definition: speaker_encoder.layer3.2.se.fc.0.weight | > Layer missing in the model definition: speaker_encoder.layer3.2.se.fc.0.bias | > Layer missing in the model definition: speaker_encoder.layer3.2.se.fc.2.weight | > Layer missing in the model definition: speaker_encoder.layer3.2.se.fc.2.bias | > Layer missing in the model definition: speaker_encoder.layer3.3.conv1.weight | > Layer missing in the model definition: speaker_encoder.layer3.3.bn1.weight | > Layer missing in the model definition: speaker_encoder.layer3.3.bn1.bias | > Layer missing in the model definition: speaker_encoder.layer3.3.bn1.running_mean | > Layer missing in the model definition: speaker_encoder.layer3.3.bn1.running_var | > Layer missing in the model definition: speaker_encoder.layer3.3.bn1.num_batches_tracked | > Layer missing in the model definition: speaker_encoder.layer3.3.conv2.weight | > Layer missing in the model definition: speaker_encoder.layer3.3.bn2.weight | > Layer missing in the model definition: speaker_encoder.layer3.3.bn2.bias | > Layer missing in the model definition: speaker_encoder.layer3.3.bn2.running_mean | > Layer missing in the model definition: speaker_encoder.layer3.3.bn2.running_var | > Layer missing in the model definition: speaker_encoder.layer3.3.bn2.num_batches_tracked | > Layer missing in the model definition: speaker_encoder.layer3.3.se.fc.0.weight | > Layer missing in the model definition: speaker_encoder.layer3.3.se.fc.0.bias | > Layer missing in the model definition: speaker_encoder.layer3.3.se.fc.2.weight | > Layer missing in the model definition: speaker_encoder.layer3.3.se.fc.2.bias | > Layer missing in the model definition: speaker_encoder.layer3.4.conv1.weight | > Layer missing in the model definition: speaker_encoder.layer3.4.bn1.weight | > Layer missing in the model definition: speaker_encoder.layer3.4.bn1.bias | > Layer missing in the model definition: speaker_encoder.layer3.4.bn1.running_mean | > Layer missing in the model definition: speaker_encoder.layer3.4.bn1.running_var | > Layer missing in the model definition: speaker_encoder.layer3.4.bn1.num_batches_tracked | > Layer missing in the model definition: speaker_encoder.layer3.4.conv2.weight | > Layer missing in the model definition: speaker_encoder.layer3.4.bn2.weight | > Layer missing in the model definition: speaker_encoder.layer3.4.bn2.bias | > Layer missing in the model definition: speaker_encoder.layer3.4.bn2.running_mean | > Layer missing in the model definition: speaker_encoder.layer3.4.bn2.running_var | > Layer missing in the model definition: speaker_encoder.layer3.4.bn2.num_batches_tracked | > Layer missing in the model definition: speaker_encoder.layer3.4.se.fc.0.weight | > Layer missing in the model definition: speaker_encoder.layer3.4.se.fc.0.bias | > Layer missing in the model definition: speaker_encoder.layer3.4.se.fc.2.weight | > Layer missing in the model definition: speaker_encoder.layer3.4.se.fc.2.bias | > Layer missing in the model definition: speaker_encoder.layer3.5.conv1.weight | > Layer missing in the model definition: speaker_encoder.layer3.5.bn1.weight | > Layer missing in the model definition: speaker_encoder.layer3.5.bn1.bias | > Layer missing in the model definition: speaker_encoder.layer3.5.bn1.running_mean | > Layer missing in the model definition: speaker_encoder.layer3.5.bn1.running_var | > Layer missing in the model definition: speaker_encoder.layer3.5.bn1.num_batches_tracked | > Layer missing in the model definition: speaker_encoder.layer3.5.conv2.weight | > Layer missing in the model definition: speaker_encoder.layer3.5.bn2.weight | > Layer missing in the model definition: speaker_encoder.layer3.5.bn2.bias | > Layer missing in the model definition: speaker_encoder.layer3.5.bn2.running_mean | > Layer missing in the model definition: speaker_encoder.layer3.5.bn2.running_var | > Layer missing in the model definition: speaker_encoder.layer3.5.bn2.num_batches_tracked | > Layer missing in the model definition: speaker_encoder.layer3.5.se.fc.0.weight | > Layer missing in the model definition: speaker_encoder.layer3.5.se.fc.0.bias | > Layer missing in the model definition: speaker_encoder.layer3.5.se.fc.2.weight | > Layer missing in the model definition: speaker_encoder.layer3.5.se.fc.2.bias | > Layer missing in the model definition: speaker_encoder.layer4.0.conv1.weight | > Layer missing in the model definition: speaker_encoder.layer4.0.bn1.weight | > Layer missing in the model definition: speaker_encoder.layer4.0.bn1.bias | > Layer missing in the model definition: speaker_encoder.layer4.0.bn1.running_mean | > Layer missing in the model definition: speaker_encoder.layer4.0.bn1.running_var | > Layer missing in the model definition: speaker_encoder.layer4.0.bn1.num_batches_tracked | > Layer missing in the model definition: speaker_encoder.layer4.0.conv2.weight | > Layer missing in the model definition: speaker_encoder.layer4.0.bn2.weight | > Layer missing in the model definition: speaker_encoder.layer4.0.bn2.bias | > Layer missing in the model definition: speaker_encoder.layer4.0.bn2.running_mean | > Layer missing in the model definition: speaker_encoder.layer4.0.bn2.running_var | > Layer missing in the model definition: speaker_encoder.layer4.0.bn2.num_batches_tracked | > Layer missing in the model definition: speaker_encoder.layer4.0.se.fc.0.weight | > Layer missing in the model definition: speaker_encoder.layer4.0.se.fc.0.bias | > Layer missing in the model definition: speaker_encoder.layer4.0.se.fc.2.weight | > Layer missing in the model definition: speaker_encoder.layer4.0.se.fc.2.bias | > Layer missing in the model definition: speaker_encoder.layer4.0.downsample.0.weight | > Layer missing in the model definition: speaker_encoder.layer4.0.downsample.1.weight | > Layer missing in the model definition: speaker_encoder.layer4.0.downsample.1.bias | > Layer missing in the model definition: speaker_encoder.layer4.0.downsample.1.running_mean | > Layer missing in the model definition: speaker_encoder.layer4.0.downsample.1.running_var | > Layer missing in the model definition: speaker_encoder.layer4.0.downsample.1.num_batches_tracked | > Layer missing in the model definition: speaker_encoder.layer4.1.conv1.weight | > Layer missing in the model definition: speaker_encoder.layer4.1.bn1.weight | > Layer missing in the model definition: speaker_encoder.layer4.1.bn1.bias | > Layer missing in the model definition: speaker_encoder.layer4.1.bn1.running_mean | > Layer missing in the model definition: speaker_encoder.layer4.1.bn1.running_var | > Layer missing in the model definition: speaker_encoder.layer4.1.bn1.num_batches_tracked | > Layer missing in the model definition: speaker_encoder.layer4.1.conv2.weight | > Layer missing in the model definition: speaker_encoder.layer4.1.bn2.weight | > Layer missing in the model definition: speaker_encoder.layer4.1.bn2.bias | > Layer missing in the model definition: speaker_encoder.layer4.1.bn2.running_mean | > Layer missing in the model definition: speaker_encoder.layer4.1.bn2.running_var | > Layer missing in the model definition: speaker_encoder.layer4.1.bn2.num_batches_tracked | > Layer missing in the model definition: speaker_encoder.layer4.1.se.fc.0.weight | > Layer missing in the model definition: speaker_encoder.layer4.1.se.fc.0.bias | > Layer missing in the model definition: speaker_encoder.layer4.1.se.fc.2.weight | > Layer missing in the model definition: speaker_encoder.layer4.1.se.fc.2.bias | > Layer missing in the model definition: speaker_encoder.layer4.2.conv1.weight | > Layer missing in the model definition: speaker_encoder.layer4.2.bn1.weight | > Layer missing in the model definition: speaker_encoder.layer4.2.bn1.bias | > Layer missing in the model definition: speaker_encoder.layer4.2.bn1.running_mean | > Layer missing in the model definition: speaker_encoder.layer4.2.bn1.running_var | > Layer missing in the model definition: speaker_encoder.layer4.2.bn1.num_batches_tracked | > Layer missing in the model definition: speaker_encoder.layer4.2.conv2.weight | > Layer missing in the model definition: speaker_encoder.layer4.2.bn2.weight | > Layer missing in the model definition: speaker_encoder.layer4.2.bn2.bias | > Layer missing in the model definition: speaker_encoder.layer4.2.bn2.running_mean | > Layer missing in the model definition: speaker_encoder.layer4.2.bn2.running_var | > Layer missing in the model definition: speaker_encoder.layer4.2.bn2.num_batches_tracked | > Layer missing in the model definition: speaker_encoder.layer4.2.se.fc.0.weight | > Layer missing in the model definition: speaker_encoder.layer4.2.se.fc.0.bias | > Layer missing in the model definition: speaker_encoder.layer4.2.se.fc.2.weight | > Layer missing in the model definition: speaker_encoder.layer4.2.se.fc.2.bias | > Layer missing in the model definition: speaker_encoder.torch_spec.0.filter | > Layer missing in the model definition: speaker_encoder.torch_spec.1.spectrogram.window | > Layer missing in the model definition: speaker_encoder.torch_spec.1.mel_scale.fb | > Layer missing in the model definition: speaker_encoder.attention.0.weight | > Layer missing in the model definition: speaker_encoder.attention.0.bias | > Layer missing in the model definition: speaker_encoder.attention.2.weight | > Layer missing in the model definition: speaker_encoder.attention.2.bias | > Layer missing in the model definition: speaker_encoder.attention.2.running_mean | > Layer missing in the model definition: speaker_encoder.attention.2.running_var | > Layer missing in the model definition: speaker_encoder.attention.2.num_batches_tracked | > Layer missing in the model definition: speaker_encoder.attention.3.weight | > Layer missing in the model definition: speaker_encoder.attention.3.bias | > Layer missing in the model definition: speaker_encoder.fc.weight | > Layer missing in the model definition: speaker_encoder.fc.bias | > Layer missing in the model definition: emb_l.weight | > Layer missing in the model definition: duration_predictor.cond_lang.weight | > Layer missing in the model definition: duration_predictor.cond_lang.bias | > Layer missing in the model definition: waveform_decoder.resblocks.0.convs.0.bias | > Layer missing in the model definition: waveform_decoder.resblocks.0.convs.0.weight_g | > Layer missing in the model definition: waveform_decoder.resblocks.0.convs.0.weight_v | > Layer missing in the model definition: waveform_decoder.resblocks.0.convs.1.bias | > Layer missing in the model definition: waveform_decoder.resblocks.0.convs.1.weight_g | > Layer missing in the model definition: waveform_decoder.resblocks.0.convs.1.weight_v | > Layer missing in the model definition: waveform_decoder.resblocks.1.convs.0.bias | > Layer missing in the model definition: waveform_decoder.resblocks.1.convs.0.weight_g | > Layer missing in the model definition: waveform_decoder.resblocks.1.convs.0.weight_v | > Layer missing in the model definition: waveform_decoder.resblocks.1.convs.1.bias | > Layer missing in the model definition: waveform_decoder.resblocks.1.convs.1.weight_g | > Layer missing in the model definition: waveform_decoder.resblocks.1.convs.1.weight_v | > Layer missing in the model definition: waveform_decoder.resblocks.2.convs.0.bias | > Layer missing in the model definition: waveform_decoder.resblocks.2.convs.0.weight_g | > Layer missing in the model definition: waveform_decoder.resblocks.2.convs.0.weight_v | > Layer missing in the model definition: waveform_decoder.resblocks.2.convs.1.bias | > Layer missing in the model definition: waveform_decoder.resblocks.2.convs.1.weight_g | > Layer missing in the model definition: waveform_decoder.resblocks.2.convs.1.weight_v | > Layer missing in the model definition: waveform_decoder.resblocks.3.convs.0.bias | > Layer missing in the model definition: waveform_decoder.resblocks.3.convs.0.weight_g | > Layer missing in the model definition: waveform_decoder.resblocks.3.convs.0.weight_v | > Layer missing in the model definition: waveform_decoder.resblocks.3.convs.1.bias | > Layer missing in the model definition: waveform_decoder.resblocks.3.convs.1.weight_g | > Layer missing in the model definition: waveform_decoder.resblocks.3.convs.1.weight_v | > Layer missing in the model definition: waveform_decoder.resblocks.4.convs.0.bias | > Layer missing in the model definition: waveform_decoder.resblocks.4.convs.0.weight_g | > Layer missing in the model definition: waveform_decoder.resblocks.4.convs.0.weight_v | > Layer missing in the model definition: waveform_decoder.resblocks.4.convs.1.bias | > Layer missing in the model definition: waveform_decoder.resblocks.4.convs.1.weight_g | > Layer missing in the model definition: waveform_decoder.resblocks.4.convs.1.weight_v | > Layer missing in the model definition: waveform_decoder.resblocks.5.convs.0.bias | > Layer missing in the model definition: waveform_decoder.resblocks.5.convs.0.weight_g | > Layer missing in the model definition: waveform_decoder.resblocks.5.convs.0.weight_v | > Layer missing in the model definition: waveform_decoder.resblocks.5.convs.1.bias | > Layer missing in the model definition: waveform_decoder.resblocks.5.convs.1.weight_g | > Layer missing in the model definition: waveform_decoder.resblocks.5.convs.1.weight_v | > Layer missing in the model definition: waveform_decoder.resblocks.6.convs.0.bias | > Layer missing in the model definition: waveform_decoder.resblocks.6.convs.0.weight_g | > Layer missing in the model definition: waveform_decoder.resblocks.6.convs.0.weight_v | > Layer missing in the model definition: waveform_decoder.resblocks.6.convs.1.bias | > Layer missing in the model definition: waveform_decoder.resblocks.6.convs.1.weight_g | > Layer missing in the model definition: waveform_decoder.resblocks.6.convs.1.weight_v | > Layer missing in the model definition: waveform_decoder.resblocks.7.convs.0.bias | > Layer missing in the model definition: waveform_decoder.resblocks.7.convs.0.weight_g | > Layer missing in the model definition: waveform_decoder.resblocks.7.convs.0.weight_v | > Layer missing in the model definition: waveform_decoder.resblocks.7.convs.1.bias | > Layer missing in the model definition: waveform_decoder.resblocks.7.convs.1.weight_g | > Layer missing in the model definition: waveform_decoder.resblocks.7.convs.1.weight_v | > Layer missing in the model definition: waveform_decoder.resblocks.8.convs.0.bias | > Layer missing in the model definition: waveform_decoder.resblocks.8.convs.0.weight_g | > Layer missing in the model definition: waveform_decoder.resblocks.8.convs.0.weight_v | > Layer missing in the model definition: waveform_decoder.resblocks.8.convs.1.bias | > Layer missing in the model definition: waveform_decoder.resblocks.8.convs.1.weight_g | > Layer missing in the model definition: waveform_decoder.resblocks.8.convs.1.weight_v | > Layer missing in the model definition: waveform_decoder.resblocks.9.convs.0.bias | > Layer missing in the model definition: waveform_decoder.resblocks.9.convs.0.weight_g | > Layer missing in the model definition: waveform_decoder.resblocks.9.convs.0.weight_v | > Layer missing in the model definition: waveform_decoder.resblocks.9.convs.1.bias | > Layer missing in the model definition: waveform_decoder.resblocks.9.convs.1.weight_g | > Layer missing in the model definition: waveform_decoder.resblocks.9.convs.1.weight_v | > Layer missing in the model definition: waveform_decoder.resblocks.10.convs.0.bias | > Layer missing in the model definition: waveform_decoder.resblocks.10.convs.0.weight_g | > Layer missing in the model definition: waveform_decoder.resblocks.10.convs.0.weight_v | > Layer missing in the model definition: waveform_decoder.resblocks.10.convs.1.bias | > Layer missing in the model definition: waveform_decoder.resblocks.10.convs.1.weight_g | > Layer missing in the model definition: waveform_decoder.resblocks.10.convs.1.weight_v | > Layer missing in the model definition: waveform_decoder.resblocks.11.convs.0.bias | > Layer missing in the model definition: waveform_decoder.resblocks.11.convs.0.weight_g | > Layer missing in the model definition: waveform_decoder.resblocks.11.convs.0.weight_v | > Layer missing in the model definition: waveform_decoder.resblocks.11.convs.1.bias | > Layer missing in the model definition: waveform_decoder.resblocks.11.convs.1.weight_g | > Layer missing in the model definition: waveform_decoder.resblocks.11.convs.1.weight_v | > 651 / 1040 layers are restored. Model restored from step 0

Model has 93900268 parameters Restoring best loss from ... Starting with loaded last best loss inf

anjalyjayakrishnan commented 1 year ago

@PS-AI, I am getting the same error while fine-tuning it for Hindi. Did you get the solution? If yes could you please share?

tomsun3 commented 1 year ago

@PS-AI @anjalyjayakrishnan Getting the same! Maybe, there is a problem with a newer version of some of the requirements?

tomsun3 commented 1 year ago

Here is my output, not the same numbers but also speaker_encoder layers are missing:

 > Restoring from model_file.pth.tar ...
 > Restoring Model...
 > Partial model initialization...
 | > Layer missing in the model definition: speaker_encoder.conv1.weight
 | > Layer missing in the model definition: speaker_encoder.conv1.bias
 | > Layer missing in the model definition: speaker_encoder.bn1.weight
 | > Layer missing in the model definition: speaker_encoder.bn1.bias
 | > Layer missing in the model definition: speaker_encoder.bn1.running_mean
 | > Layer missing in the model definition: speaker_encoder.bn1.running_var
 | > Layer missing in the model definition: speaker_encoder.bn1.num_batches_tracked
 | > Layer missing in the model definition: speaker_encoder.layer1.0.conv1.weight
 | > Layer missing in the model definition: speaker_encoder.layer1.0.bn1.weight
 | > Layer missing in the model definition: speaker_encoder.layer1.0.bn1.bias
 | > Layer missing in the model definition: speaker_encoder.layer1.0.bn1.running_mean
 | > Layer missing in the model definition: speaker_encoder.layer1.0.bn1.running_var
 | > Layer missing in the model definition: speaker_encoder.layer1.0.bn1.num_batches_tracked
 | > Layer missing in the model definition: speaker_encoder.layer1.0.conv2.weight
 | > Layer missing in the model definition: speaker_encoder.layer1.0.bn2.weight
 | > Layer missing in the model definition: speaker_encoder.layer1.0.bn2.bias
 | > Layer missing in the model definition: speaker_encoder.layer1.0.bn2.running_mean
 | > Layer missing in the model definition: speaker_encoder.layer1.0.bn2.running_var
 | > Layer missing in the model definition: speaker_encoder.layer1.0.bn2.num_batches_tracked
 | > Layer missing in the model definition: speaker_encoder.layer1.0.se.fc.0.weight
 | > Layer missing in the model definition: speaker_encoder.layer1.0.se.fc.0.bias
 | > Layer missing in the model definition: speaker_encoder.layer1.0.se.fc.2.weight
 | > Layer missing in the model definition: speaker_encoder.layer1.0.se.fc.2.bias
 | > Layer missing in the model definition: speaker_encoder.layer1.1.conv1.weight
 | > Layer missing in the model definition: speaker_encoder.layer1.1.bn1.weight
 | > Layer missing in the model definition: speaker_encoder.layer1.1.bn1.bias
 | > Layer missing in the model definition: speaker_encoder.layer1.1.bn1.running_mean
 | > Layer missing in the model definition: speaker_encoder.layer1.1.bn1.running_var
 | > Layer missing in the model definition: speaker_encoder.layer1.1.bn1.num_batches_tracked
 | > Layer missing in the model definition: speaker_encoder.layer1.1.conv2.weight
 | > Layer missing in the model definition: speaker_encoder.layer1.1.bn2.weight
 | > Layer missing in the model definition: speaker_encoder.layer1.1.bn2.bias
 | > Layer missing in the model definition: speaker_encoder.layer1.1.bn2.running_mean
 | > Layer missing in the model definition: speaker_encoder.layer1.1.bn2.running_var
 | > Layer missing in the model definition: speaker_encoder.layer1.1.bn2.num_batches_tracked
 | > Layer missing in the model definition: speaker_encoder.layer1.1.se.fc.0.weight
 | > Layer missing in the model definition: speaker_encoder.layer1.1.se.fc.0.bias
 | > Layer missing in the model definition: speaker_encoder.layer1.1.se.fc.2.weight
 | > Layer missing in the model definition: speaker_encoder.layer1.1.se.fc.2.bias
 | > Layer missing in the model definition: speaker_encoder.layer1.2.conv1.weight
 | > Layer missing in the model definition: speaker_encoder.layer1.2.bn1.weight
 | > Layer missing in the model definition: speaker_encoder.layer1.2.bn1.bias
 | > Layer missing in the model definition: speaker_encoder.layer1.2.bn1.running_mean
 | > Layer missing in the model definition: speaker_encoder.layer1.2.bn1.running_var
 | > Layer missing in the model definition: speaker_encoder.layer1.2.bn1.num_batches_tracked
 | > Layer missing in the model definition: speaker_encoder.layer1.2.conv2.weight
 | > Layer missing in the model definition: speaker_encoder.layer1.2.bn2.weight
 | > Layer missing in the model definition: speaker_encoder.layer1.2.bn2.bias
 | > Layer missing in the model definition: speaker_encoder.layer1.2.bn2.running_mean
 | > Layer missing in the model definition: speaker_encoder.layer1.2.bn2.running_var
 | > Layer missing in the model definition: speaker_encoder.layer1.2.bn2.num_batches_tracked
 | > Layer missing in the model definition: speaker_encoder.layer1.2.se.fc.0.weight
 | > Layer missing in the model definition: speaker_encoder.layer1.2.se.fc.0.bias
 | > Layer missing in the model definition: speaker_encoder.layer1.2.se.fc.2.weight
 | > Layer missing in the model definition: speaker_encoder.layer1.2.se.fc.2.bias
 | > Layer missing in the model definition: speaker_encoder.layer2.0.conv1.weight
 | > Layer missing in the model definition: speaker_encoder.layer2.0.bn1.weight
 | > Layer missing in the model definition: speaker_encoder.layer2.0.bn1.bias
 | > Layer missing in the model definition: speaker_encoder.layer2.0.bn1.running_mean
 | > Layer missing in the model definition: speaker_encoder.layer2.0.bn1.running_var
 | > Layer missing in the model definition: speaker_encoder.layer2.0.bn1.num_batches_tracked
 | > Layer missing in the model definition: speaker_encoder.layer2.0.conv2.weight
 | > Layer missing in the model definition: speaker_encoder.layer2.0.bn2.weight
 | > Layer missing in the model definition: speaker_encoder.layer2.0.bn2.bias
 | > Layer missing in the model definition: speaker_encoder.layer2.0.bn2.running_mean
 | > Layer missing in the model definition: speaker_encoder.layer2.0.bn2.running_var
 | > Layer missing in the model definition: speaker_encoder.layer2.0.bn2.num_batches_tracked
 | > Layer missing in the model definition: speaker_encoder.layer2.0.se.fc.0.weight
 | > Layer missing in the model definition: speaker_encoder.layer2.0.se.fc.0.bias
 | > Layer missing in the model definition: speaker_encoder.layer2.0.se.fc.2.weight
 | > Layer missing in the model definition: speaker_encoder.layer2.0.se.fc.2.bias
 | > Layer missing in the model definition: speaker_encoder.layer2.0.downsample.0.weight
 | > Layer missing in the model definition: speaker_encoder.layer2.0.downsample.1.weight
 | > Layer missing in the model definition: speaker_encoder.layer2.0.downsample.1.bias
 | > Layer missing in the model definition: speaker_encoder.layer2.0.downsample.1.running_mean
 | > Layer missing in the model definition: speaker_encoder.layer2.0.downsample.1.running_var
 | > Layer missing in the model definition: speaker_encoder.layer2.0.downsample.1.num_batches_tracked
 | > Layer missing in the model definition: speaker_encoder.layer2.1.conv1.weight
 | > Layer missing in the model definition: speaker_encoder.layer2.1.bn1.weight
 | > Layer missing in the model definition: speaker_encoder.layer2.1.bn1.bias
 | > Layer missing in the model definition: speaker_encoder.layer2.1.bn1.running_mean
 | > Layer missing in the model definition: speaker_encoder.layer2.1.bn1.running_var
 | > Layer missing in the model definition: speaker_encoder.layer2.1.bn1.num_batches_tracked
 | > Layer missing in the model definition: speaker_encoder.layer2.1.conv2.weight
 | > Layer missing in the model definition: speaker_encoder.layer2.1.bn2.weight
 | > Layer missing in the model definition: speaker_encoder.layer2.1.bn2.bias
 | > Layer missing in the model definition: speaker_encoder.layer2.1.bn2.running_mean
 | > Layer missing in the model definition: speaker_encoder.layer2.1.bn2.running_var
 | > Layer missing in the model definition: speaker_encoder.layer2.1.bn2.num_batches_tracked
 | > Layer missing in the model definition: speaker_encoder.layer2.1.se.fc.0.weight
 | > Layer missing in the model definition: speaker_encoder.layer2.1.se.fc.0.bias
 | > Layer missing in the model definition: speaker_encoder.layer2.1.se.fc.2.weight
 | > Layer missing in the model definition: speaker_encoder.layer2.1.se.fc.2.bias
 | > Layer missing in the model definition: speaker_encoder.layer2.2.conv1.weight
 | > Layer missing in the model definition: speaker_encoder.layer2.2.bn1.weight
 | > Layer missing in the model definition: speaker_encoder.layer2.2.bn1.bias
 | > Layer missing in the model definition: speaker_encoder.layer2.2.bn1.running_mean
 | > Layer missing in the model definition: speaker_encoder.layer2.2.bn1.running_var
 | > Layer missing in the model definition: speaker_encoder.layer2.2.bn1.num_batches_tracked
 | > Layer missing in the model definition: speaker_encoder.layer2.2.conv2.weight
 | > Layer missing in the model definition: speaker_encoder.layer2.2.bn2.weight
 | > Layer missing in the model definition: speaker_encoder.layer2.2.bn2.bias
 | > Layer missing in the model definition: speaker_encoder.layer2.2.bn2.running_mean
 | > Layer missing in the model definition: speaker_encoder.layer2.2.bn2.running_var
 | > Layer missing in the model definition: speaker_encoder.layer2.2.bn2.num_batches_tracked
 | > Layer missing in the model definition: speaker_encoder.layer2.2.se.fc.0.weight
 | > Layer missing in the model definition: speaker_encoder.layer2.2.se.fc.0.bias
 | > Layer missing in the model definition: speaker_encoder.layer2.2.se.fc.2.weight
 | > Layer missing in the model definition: speaker_encoder.layer2.2.se.fc.2.bias
 | > Layer missing in the model definition: speaker_encoder.layer2.3.conv1.weight
 | > Layer missing in the model definition: speaker_encoder.layer2.3.bn1.weight
 | > Layer missing in the model definition: speaker_encoder.layer2.3.bn1.bias
 | > Layer missing in the model definition: speaker_encoder.layer2.3.bn1.running_mean
 | > Layer missing in the model definition: speaker_encoder.layer2.3.bn1.running_var
 | > Layer missing in the model definition: speaker_encoder.layer2.3.bn1.num_batches_tracked
 | > Layer missing in the model definition: speaker_encoder.layer2.3.conv2.weight
 | > Layer missing in the model definition: speaker_encoder.layer2.3.bn2.weight
 | > Layer missing in the model definition: speaker_encoder.layer2.3.bn2.bias
 | > Layer missing in the model definition: speaker_encoder.layer2.3.bn2.running_mean
 | > Layer missing in the model definition: speaker_encoder.layer2.3.bn2.running_var
 | > Layer missing in the model definition: speaker_encoder.layer2.3.bn2.num_batches_tracked
 | > Layer missing in the model definition: speaker_encoder.layer2.3.se.fc.0.weight
 | > Layer missing in the model definition: speaker_encoder.layer2.3.se.fc.0.bias
 | > Layer missing in the model definition: speaker_encoder.layer2.3.se.fc.2.weight
 | > Layer missing in the model definition: speaker_encoder.layer2.3.se.fc.2.bias
 | > Layer missing in the model definition: speaker_encoder.layer3.0.conv1.weight
 | > Layer missing in the model definition: speaker_encoder.layer3.0.bn1.weight
 | > Layer missing in the model definition: speaker_encoder.layer3.0.bn1.bias
 | > Layer missing in the model definition: speaker_encoder.layer3.0.bn1.running_mean
 | > Layer missing in the model definition: speaker_encoder.layer3.0.bn1.running_var
 | > Layer missing in the model definition: speaker_encoder.layer3.0.bn1.num_batches_tracked
 | > Layer missing in the model definition: speaker_encoder.layer3.0.conv2.weight
 | > Layer missing in the model definition: speaker_encoder.layer3.0.bn2.weight
 | > Layer missing in the model definition: speaker_encoder.layer3.0.bn2.bias
 | > Layer missing in the model definition: speaker_encoder.layer3.0.bn2.running_mean
 | > Layer missing in the model definition: speaker_encoder.layer3.0.bn2.running_var
 | > Layer missing in the model definition: speaker_encoder.layer3.0.bn2.num_batches_tracked
 | > Layer missing in the model definition: speaker_encoder.layer3.0.se.fc.0.weight
 | > Layer missing in the model definition: speaker_encoder.layer3.0.se.fc.0.bias
 | > Layer missing in the model definition: speaker_encoder.layer3.0.se.fc.2.weight
 | > Layer missing in the model definition: speaker_encoder.layer3.0.se.fc.2.bias
 | > Layer missing in the model definition: speaker_encoder.layer3.0.downsample.0.weight
 | > Layer missing in the model definition: speaker_encoder.layer3.0.downsample.1.weight
 | > Layer missing in the model definition: speaker_encoder.layer3.0.downsample.1.bias
 | > Layer missing in the model definition: speaker_encoder.layer3.0.downsample.1.running_mean
 | > Layer missing in the model definition: speaker_encoder.layer3.0.downsample.1.running_var
 | > Layer missing in the model definition: speaker_encoder.layer3.0.downsample.1.num_batches_tracked
 | > Layer missing in the model definition: speaker_encoder.layer3.1.conv1.weight
 | > Layer missing in the model definition: speaker_encoder.layer3.1.bn1.weight
 | > Layer missing in the model definition: speaker_encoder.layer3.1.bn1.bias
 | > Layer missing in the model definition: speaker_encoder.layer3.1.bn1.running_mean
 | > Layer missing in the model definition: speaker_encoder.layer3.1.bn1.running_var
 | > Layer missing in the model definition: speaker_encoder.layer3.1.bn1.num_batches_tracked
 | > Layer missing in the model definition: speaker_encoder.layer3.1.conv2.weight
 | > Layer missing in the model definition: speaker_encoder.layer3.1.bn2.weight
 | > Layer missing in the model definition: speaker_encoder.layer3.1.bn2.bias
 | > Layer missing in the model definition: speaker_encoder.layer3.1.bn2.running_mean
 | > Layer missing in the model definition: speaker_encoder.layer3.1.bn2.running_var
 | > Layer missing in the model definition: speaker_encoder.layer3.1.bn2.num_batches_tracked
 | > Layer missing in the model definition: speaker_encoder.layer3.1.se.fc.0.weight
 | > Layer missing in the model definition: speaker_encoder.layer3.1.se.fc.0.bias
 | > Layer missing in the model definition: speaker_encoder.layer3.1.se.fc.2.weight
 | > Layer missing in the model definition: speaker_encoder.layer3.1.se.fc.2.bias
 | > Layer missing in the model definition: speaker_encoder.layer3.2.conv1.weight
 | > Layer missing in the model definition: speaker_encoder.layer3.2.bn1.weight
 | > Layer missing in the model definition: speaker_encoder.layer3.2.bn1.bias
 | > Layer missing in the model definition: speaker_encoder.layer3.2.bn1.running_mean
 | > Layer missing in the model definition: speaker_encoder.layer3.2.bn1.running_var
 | > Layer missing in the model definition: speaker_encoder.layer3.2.bn1.num_batches_tracked
 | > Layer missing in the model definition: speaker_encoder.layer3.2.conv2.weight
 | > Layer missing in the model definition: speaker_encoder.layer3.2.bn2.weight
 | > Layer missing in the model definition: speaker_encoder.layer3.2.bn2.bias
 | > Layer missing in the model definition: speaker_encoder.layer3.2.bn2.running_mean
 | > Layer missing in the model definition: speaker_encoder.layer3.2.bn2.running_var
 | > Layer missing in the model definition: speaker_encoder.layer3.2.bn2.num_batches_tracked
 | > Layer missing in the model definition: speaker_encoder.layer3.2.se.fc.0.weight
 | > Layer missing in the model definition: speaker_encoder.layer3.2.se.fc.0.bias
 | > Layer missing in the model definition: speaker_encoder.layer3.2.se.fc.2.weight
 | > Layer missing in the model definition: speaker_encoder.layer3.2.se.fc.2.bias
 | > Layer missing in the model definition: speaker_encoder.layer3.3.conv1.weight
 | > Layer missing in the model definition: speaker_encoder.layer3.3.bn1.weight
 | > Layer missing in the model definition: speaker_encoder.layer3.3.bn1.bias
 | > Layer missing in the model definition: speaker_encoder.layer3.3.bn1.running_mean
 | > Layer missing in the model definition: speaker_encoder.layer3.3.bn1.running_var
 | > Layer missing in the model definition: speaker_encoder.layer3.3.bn1.num_batches_tracked
 | > Layer missing in the model definition: speaker_encoder.layer3.3.conv2.weight
 | > Layer missing in the model definition: speaker_encoder.layer3.3.bn2.weight
 | > Layer missing in the model definition: speaker_encoder.layer3.3.bn2.bias
 | > Layer missing in the model definition: speaker_encoder.layer3.3.bn2.running_mean
 | > Layer missing in the model definition: speaker_encoder.layer3.3.bn2.running_var
 | > Layer missing in the model definition: speaker_encoder.layer3.3.bn2.num_batches_tracked
 | > Layer missing in the model definition: speaker_encoder.layer3.3.se.fc.0.weight
 | > Layer missing in the model definition: speaker_encoder.layer3.3.se.fc.0.bias
 | > Layer missing in the model definition: speaker_encoder.layer3.3.se.fc.2.weight
 | > Layer missing in the model definition: speaker_encoder.layer3.3.se.fc.2.bias
 | > Layer missing in the model definition: speaker_encoder.layer3.4.conv1.weight
 | > Layer missing in the model definition: speaker_encoder.layer3.4.bn1.weight
 | > Layer missing in the model definition: speaker_encoder.layer3.4.bn1.bias
 | > Layer missing in the model definition: speaker_encoder.layer3.4.bn1.running_mean
 | > Layer missing in the model definition: speaker_encoder.layer3.4.bn1.running_var
 | > Layer missing in the model definition: speaker_encoder.layer3.4.bn1.num_batches_tracked
 | > Layer missing in the model definition: speaker_encoder.layer3.4.conv2.weight
 | > Layer missing in the model definition: speaker_encoder.layer3.4.bn2.weight
 | > Layer missing in the model definition: speaker_encoder.layer3.4.bn2.bias
 | > Layer missing in the model definition: speaker_encoder.layer3.4.bn2.running_mean
 | > Layer missing in the model definition: speaker_encoder.layer3.4.bn2.running_var
 | > Layer missing in the model definition: speaker_encoder.layer3.4.bn2.num_batches_tracked
 | > Layer missing in the model definition: speaker_encoder.layer3.4.se.fc.0.weight
 | > Layer missing in the model definition: speaker_encoder.layer3.4.se.fc.0.bias
 | > Layer missing in the model definition: speaker_encoder.layer3.4.se.fc.2.weight
 | > Layer missing in the model definition: speaker_encoder.layer3.4.se.fc.2.bias
 | > Layer missing in the model definition: speaker_encoder.layer3.5.conv1.weight
 | > Layer missing in the model definition: speaker_encoder.layer3.5.bn1.weight
 | > Layer missing in the model definition: speaker_encoder.layer3.5.bn1.bias
 | > Layer missing in the model definition: speaker_encoder.layer3.5.bn1.running_mean
 | > Layer missing in the model definition: speaker_encoder.layer3.5.bn1.running_var
 | > Layer missing in the model definition: speaker_encoder.layer3.5.bn1.num_batches_tracked
 | > Layer missing in the model definition: speaker_encoder.layer3.5.conv2.weight
 | > Layer missing in the model definition: speaker_encoder.layer3.5.bn2.weight
 | > Layer missing in the model definition: speaker_encoder.layer3.5.bn2.bias
 | > Layer missing in the model definition: speaker_encoder.layer3.5.bn2.running_mean
 | > Layer missing in the model definition: speaker_encoder.layer3.5.bn2.running_var
 | > Layer missing in the model definition: speaker_encoder.layer3.5.bn2.num_batches_tracked
 | > Layer missing in the model definition: speaker_encoder.layer3.5.se.fc.0.weight
 | > Layer missing in the model definition: speaker_encoder.layer3.5.se.fc.0.bias
 | > Layer missing in the model definition: speaker_encoder.layer3.5.se.fc.2.weight
 | > Layer missing in the model definition: speaker_encoder.layer3.5.se.fc.2.bias
 | > Layer missing in the model definition: speaker_encoder.layer4.0.conv1.weight
 | > Layer missing in the model definition: speaker_encoder.layer4.0.bn1.weight
 | > Layer missing in the model definition: speaker_encoder.layer4.0.bn1.bias
 | > Layer missing in the model definition: speaker_encoder.layer4.0.bn1.running_mean
 | > Layer missing in the model definition: speaker_encoder.layer4.0.bn1.running_var
 | > Layer missing in the model definition: speaker_encoder.layer4.0.bn1.num_batches_tracked
 | > Layer missing in the model definition: speaker_encoder.layer4.0.conv2.weight
 | > Layer missing in the model definition: speaker_encoder.layer4.0.bn2.weight
 | > Layer missing in the model definition: speaker_encoder.layer4.0.bn2.bias
 | > Layer missing in the model definition: speaker_encoder.layer4.0.bn2.running_mean
 | > Layer missing in the model definition: speaker_encoder.layer4.0.bn2.running_var
 | > Layer missing in the model definition: speaker_encoder.layer4.0.bn2.num_batches_tracked
 | > Layer missing in the model definition: speaker_encoder.layer4.0.se.fc.0.weight
 | > Layer missing in the model definition: speaker_encoder.layer4.0.se.fc.0.bias
 | > Layer missing in the model definition: speaker_encoder.layer4.0.se.fc.2.weight
 | > Layer missing in the model definition: speaker_encoder.layer4.0.se.fc.2.bias
 | > Layer missing in the model definition: speaker_encoder.layer4.0.downsample.0.weight
 | > Layer missing in the model definition: speaker_encoder.layer4.0.downsample.1.weight
 | > Layer missing in the model definition: speaker_encoder.layer4.0.downsample.1.bias
 | > Layer missing in the model definition: speaker_encoder.layer4.0.downsample.1.running_mean
 | > Layer missing in the model definition: speaker_encoder.layer4.0.downsample.1.running_var
 | > Layer missing in the model definition: speaker_encoder.layer4.0.downsample.1.num_batches_tracked
 | > Layer missing in the model definition: speaker_encoder.layer4.1.conv1.weight
 | > Layer missing in the model definition: speaker_encoder.layer4.1.bn1.weight
 | > Layer missing in the model definition: speaker_encoder.layer4.1.bn1.bias
 | > Layer missing in the model definition: speaker_encoder.layer4.1.bn1.running_mean
 | > Layer missing in the model definition: speaker_encoder.layer4.1.bn1.running_var
 | > Layer missing in the model definition: speaker_encoder.layer4.1.bn1.num_batches_tracked
 | > Layer missing in the model definition: speaker_encoder.layer4.1.conv2.weight
 | > Layer missing in the model definition: speaker_encoder.layer4.1.bn2.weight
 | > Layer missing in the model definition: speaker_encoder.layer4.1.bn2.bias
 | > Layer missing in the model definition: speaker_encoder.layer4.1.bn2.running_mean
 | > Layer missing in the model definition: speaker_encoder.layer4.1.bn2.running_var
 | > Layer missing in the model definition: speaker_encoder.layer4.1.bn2.num_batches_tracked
 | > Layer missing in the model definition: speaker_encoder.layer4.1.se.fc.0.weight
 | > Layer missing in the model definition: speaker_encoder.layer4.1.se.fc.0.bias
 | > Layer missing in the model definition: speaker_encoder.layer4.1.se.fc.2.weight
 | > Layer missing in the model definition: speaker_encoder.layer4.1.se.fc.2.bias
 | > Layer missing in the model definition: speaker_encoder.layer4.2.conv1.weight
 | > Layer missing in the model definition: speaker_encoder.layer4.2.bn1.weight
 | > Layer missing in the model definition: speaker_encoder.layer4.2.bn1.bias
 | > Layer missing in the model definition: speaker_encoder.layer4.2.bn1.running_mean
 | > Layer missing in the model definition: speaker_encoder.layer4.2.bn1.running_var
 | > Layer missing in the model definition: speaker_encoder.layer4.2.bn1.num_batches_tracked
 | > Layer missing in the model definition: speaker_encoder.layer4.2.conv2.weight
 | > Layer missing in the model definition: speaker_encoder.layer4.2.bn2.weight
 | > Layer missing in the model definition: speaker_encoder.layer4.2.bn2.bias
 | > Layer missing in the model definition: speaker_encoder.layer4.2.bn2.running_mean
 | > Layer missing in the model definition: speaker_encoder.layer4.2.bn2.running_var
 | > Layer missing in the model definition: speaker_encoder.layer4.2.bn2.num_batches_tracked
 | > Layer missing in the model definition: speaker_encoder.layer4.2.se.fc.0.weight
 | > Layer missing in the model definition: speaker_encoder.layer4.2.se.fc.0.bias
 | > Layer missing in the model definition: speaker_encoder.layer4.2.se.fc.2.weight
 | > Layer missing in the model definition: speaker_encoder.layer4.2.se.fc.2.bias
 | > Layer missing in the model definition: speaker_encoder.torch_spec.0.filter
 | > Layer missing in the model definition: speaker_encoder.torch_spec.1.spectrogram.window
 | > Layer missing in the model definition: speaker_encoder.torch_spec.1.mel_scale.fb
 | > Layer missing in the model definition: speaker_encoder.attention.0.weight
 | > Layer missing in the model definition: speaker_encoder.attention.0.bias
 | > Layer missing in the model definition: speaker_encoder.attention.2.weight
 | > Layer missing in the model definition: speaker_encoder.attention.2.bias
 | > Layer missing in the model definition: speaker_encoder.attention.2.running_mean
 | > Layer missing in the model definition: speaker_encoder.attention.2.running_var
 | > Layer missing in the model definition: speaker_encoder.attention.2.num_batches_tracked
 | > Layer missing in the model definition: speaker_encoder.attention.3.weight
 | > Layer missing in the model definition: speaker_encoder.attention.3.bias
 | > Layer missing in the model definition: speaker_encoder.fc.weight
 | > Layer missing in the model definition: speaker_encoder.fc.bias
 | > Layer dimention missmatch between model definition and checkpoint: emb_l.weight
 | > Layer dimention missmatch between model definition and checkpoint: text_encoder.emb.weight
 | > 897 / 899 layers are restored.
 > Model restored from step 0

 > Model has 86802636 parameters
souvikg544 commented 1 year ago

Hello did anyone get the solution ... For folks here doing it in Indian languages lets connect - https://www.linkedin.com/in/souvik-ghosh-3b8b411b2/

harshvardhan-truefan commented 1 year ago

Hi @souvikg544, Hope you are doing well. Did you get any solution for your problem and have any inputs for training YourTSS for a custom dataset ?

Look forward to hearing from you!

Regards, Harsh