CorentinJ / Real-Time-Voice-Cloning

Clone a voice in 5 seconds to generate arbitrary speech in real-time
Other
51.65k stars 8.66k forks source link

help me Error(s) in loading state_dict for Tacotron: #1159

Open xjsdn opened 1 year ago

xjsdn commented 1 year ago

Loaded encoder "encoder.pt" trained to step 1564501 Synthesizer using device: cuda Trainable Parameters: 30.892M Traceback (most recent call last): File "H:\tts4\Real-Time-Voice\toolbox__init.py", line 114, in func = lambda: self.synthesize() or self.vocode() File "H:\tts4\Real-Time-Voice\toolbox__init__.py", line 217, in synthesize specs = self.synthesizer.synthesize_spectrograms(texts, embeds) File "H:\tts4\Real-Time-Voice\synthesizer\inference.py", line 86, in synthesize_spectrograms self.load() File "H:\tts4\Real-Time-Voice\synthesizer\inference.py", line 64, in load self._model.load(self.model_fpath) File "H:\tts4\Real-Time-Voice\synthesizer\models\tacotron.py", line 497, in load self.load_state_dict(checkpoint["model_state"]) File "H:\tts4\Real-Time-Voice\venv\lib\site-packages\torch\nn\modules\module.py", line 1407, in load_state_dict self.class.name__, "\n\t".join(error_msgs))) RuntimeError: Error(s) in loading state_dict for Tacotron: Unexpected key(s) in state_dict: "gst.encoder.convs.0.weight", "gst.encoder.convs.0.bias", "gst.encoder.convs.1.weight", "gst.encoder.convs.1.bias", "gst.encoder.convs.2.weight", "gst.encoder.convs.2.bias", "gst.encoder.convs.3.weight", "gst.encoder.convs.3.bias", "gst.encoder.convs.4.weight", "gst.encoder.convs.4.bias", "gst.encoder.convs.5.weight", "gst.encoder.convs.5.bias", "gst.encoder.bns.0.weight", "gst.encoder.bns.0.bias", "gst.encoder.bns.0.running_mean", "gst.encoder.bns.0.running_var", "gst.encoder.bns.0.num_batches_tracked", "gst.encoder.bns.1.weight", "gst.encoder.bns.1.bias", "gst.encoder.bns.1.running_mean", "gst.encoder.bns.1.running_var", "gst.encoder.bns.1.num_batches_tracked", "gst.encoder.bns.2.weight", "gst.encoder.bns.2.bias", "gst.encoder.bns.2.running_mean", "gst.encoder.bns.2.running_var", "gst.encoder.bns.2.num_batches_tracked", "gst.encoder.bns.3.weight", "gst.encoder.bns.3.bias", "gst.encoder.bns.3.running_mean", "gst.encoder.bns.3.running_var", "gst.encoder.bns.3.num_batches_tracked", "gst.encoder.bns.4.weight", "gst.encoder.bns.4.bias", "gst.encoder.bns.4.running_mean", "gst.encoder.bns.4.running_var", "gst.encoder.bns.4.num_batches_tracked", "gst.encoder.bns.5.weight", "gst.encoder.bns.5.bias", "gst.encoder.bns.5.running_mean", "gst.encoder.bns.5.running_var", "gst.encoder.bns.5.num_batches_tracked", "gst.encoder.gru.weight_ih_l0", "gst.encoder.gru.weight_hh_l0", "gst.encoder.gru.bias_ih_l0", "gst.encoder.gru.bias_hh_l0", "gst.stl.embed", "gst.stl.attention.W_query.weight", "gst.stl.attention.W_key.weight", "gst.stl.attention.W_value.weight". size mismatch for encoder_proj.weight: copying a param with shape torch.Size([128, 1024]) from checkpoint, the shape in current model is torch.Size([128, 512]). size mismatch for decoder.attn_rnn.weight_ih: copying a param with shape torch.Size([384, 1280]) from checkpoint, the shape in current model is torch.Size([384, 768]). size mismatch for decoder.rnn_input.weight: copying a param with shape torch.Size([1024, 1152]) from checkpoint, the shape in current model is torch.Size([1024, 640]). size mismatch for decoder.stop_proj.weight: copying a param with shape torch.Size([1, 2048]) from checkpoint, the shape in current model is torch.Size([1, 1536]).