babysor / MockingBird

🚀AI拟声: 5秒内克隆您的声音并生成任意语音内容 Clone a voice in 5 seconds to generate arbitrary speech in real-time
35.41k stars 5.21k forks source link

用这里的模型跑出现这个RuntimeError: Error(s) in loading state_dict for Tacotron: size mismatch for encoder.embedding.weight: copying a param with shape torch.Size([70, 512]) from checkpoint, the shape in current model is torch.Size([75, 512]). #37

Closed wangkewk closed 3 years ago

wangkewk commented 3 years ago


wangkewk commented 3 years ago


babysor commented 3 years ago

这个是我最近一个修复导致的不兼容问题, 你可以把文件中:synthesizer/utils/ 第11行的内容 改为: _characters = 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz12340!\'(),-.:;? ' 即可。暂时先不要关闭这个issue吧。我看下遇到的人太多的话我做个兼容

wangkewk commented 3 years ago

这个是我最近一个修复导致的不兼容问题, 你可以把文件中:synthesizer/utils/ 第11行的symbols 改为: _characters = 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz12340!\'(),-.:;? ' 即可。暂时先不要关闭这个issue吧。我看下遇到的人太多的话我做个兼容


soft-di commented 3 years ago


FuryMartin commented 3 years ago

这个是我最近一个修复导致的不兼容问题, 你可以把文件中:synthesizer/utils/ 第11行的symbols 改为: _characters = 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz12340!\'(),-.:;? ' 即可。暂时先不要关闭这个issue吧。我看下遇到的人太多的话我做个兼容


ALSYLY commented 3 years ago

这个是我最近一个修复导致的不兼容问题, 你可以把文件中:synthesizer/utils/ 第11行的symbols 改为: _characters = 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz12340!\'(),-.:;? ' 即可。暂时先不要关闭这个issue吧。我看下遇到的人太多的话我做个兼容


sanhuafeiluo commented 3 years ago

这个是我最近一个修复导致的不兼容问题, 你可以把文件中:synthesizer/utils/ 第11行的symbols 改为: _characters = 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz12340!\'(),-.:;? ' 即可。暂时先不要关闭这个issue吧。我看下遇到的人太多的话我做个兼容


vc815 commented 3 years ago


duolanda commented 3 years ago

这个是我最近一个修复导致的不兼容问题, 你可以把文件中:synthesizer/utils/ 第11行的symbols 改为: _characters = 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz12340!\'(),-.:;? ' 即可。暂时先不要关闭这个issue吧。我看下遇到的人太多的话我做个兼容


zhangxiaozhier commented 3 years ago


yukikawas commented 3 years ago


diyanqi commented 3 years ago


skygongque commented 3 years ago


Puwong commented 3 years ago

问题确实解决了,但是声音质量没有哔哩哔哩的效果好,我特意找到的小说的录音,不知道是哪里有问题。 如果想要声音特别像某个人的声音,要怎么提高呢?

betsyalan commented 3 years ago


utmcontent commented 3 years ago

这个是我最近一个修复导致的不兼容问题, 你可以把文件中:synthesizer/utils/ 第11行的symbols 改为: _characters = 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz12340!\'(),-.:;? ' 即可。暂时先不要关闭这个issue吧。我看下遇到的人太多的话我做个兼容

如果使用自己训练的模型 要把这个改回去才有效吗 还是不用改也行 我试了下新训练的没声音(也有可能是自己训练的问题)但是用给的模型是正常

babysor commented 3 years ago

这个是我最近一个修复导致的不兼容问题, 你可以把文件中:synthesizer/utils/ 第11行的symbols 改为: _characters = 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz12340!\'(),-.:;? ' 即可。暂时先不要关闭这个issue吧。我看下遇到的人太多的话我做个兼容

如果使用自己训练的模型 要把这个改回去才有效吗 还是不用改也行 我试了下新训练的没声音(也有可能是自己训练的问题)但是用给的模型是正常

改回去效果会好一点 但是不改也可以工作的

JeffCheung85 commented 3 years ago


xugaoxiang commented 3 years ago


QiYYZH commented 3 years ago


babysor commented 3 years ago



wa008 commented 3 years ago


chenyv118 commented 3 years ago


chenyv118 commented 3 years ago

问题确实解决了,但是声音质量没有哔哩哔哩的效果好,我特意找到的小说的录音,不知道是哪里有问题。 如果想要声音特别像某个人的声音,要怎么提高呢?


babysor commented 3 years ago

问题确实解决了,但是声音质量没有哔哩哔哩的效果好,我特意找到的小说的录音,不知道是哪里有问题。 如果想要声音特别像某个人的声音,要怎么提高呢?



Jackxwb commented 3 years ago


Synthesizer using device: cuda
Trainable Parameters: 32.735M
Traceback (most recent call last):
  File "D:\AI\sv2tts_china\MockingBird\toolbox\", line 123, in <lambda>
    func = lambda: self.synthesize() or self.vocode()
  File "D:\AI\sv2tts_china\MockingBird\toolbox\", line 238, in synthesize
    specs = self.synthesizer.synthesize_spectrograms(texts, embeds, style_idx=int(self.ui.style_slider.value()), min_stop_token=min_token)
  File "D:\AI\sv2tts_china\MockingBird\synthesizer\", line 87, in synthesize_spectrograms
  File "D:\AI\sv2tts_china\MockingBird\synthesizer\", line 65, in load
  File "D:\AI\sv2tts_china\MockingBird\synthesizer\models\", line 525, in load
    self.load_state_dict(checkpoint["model_state"], strict=False)
  File "D:\ProgramData\Anaconda3\envs\Real-Time-Voice-Cloning\lib\site-packages\torch\nn\modules\", line 1483, in load_state_dict
    self.__class__.__name__, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for Tacotron:
        size mismatch for encoder_proj.weight: copying a param with shape torch.Size([128, 512]) from checkpoint, the shape in current model is torch.Size([128, 1024]).
        size mismatch for decoder.attn_rnn.weight_ih: copying a param with shape torch.Size([384, 768]) from checkpoint, the shape in current model is torch.Size([384, 1280]).
        size mismatch for decoder.rnn_input.weight: copying a param with shape torch.Size([1024, 640]) from checkpoint, the shape in current model is torch.Size([1024, 1152]).
        size mismatch for decoder.stop_proj.weight: copying a param with shape torch.Size([1, 1536]) from checkpoint, the shape in current model is torch.Size([1, 2048]).
Wu-Pretend commented 3 years ago


Synthesizer using device: cuda
Trainable Parameters: 32.735M
Traceback (most recent call last):
  File "D:\AI\sv2tts_china\MockingBird\toolbox\", line 123, in <lambda>
    func = lambda: self.synthesize() or self.vocode()
  File "D:\AI\sv2tts_china\MockingBird\toolbox\", line 238, in synthesize
    specs = self.synthesizer.synthesize_spectrograms(texts, embeds, style_idx=int(self.ui.style_slider.value()), min_stop_token=min_token)
  File "D:\AI\sv2tts_china\MockingBird\synthesizer\", line 87, in synthesize_spectrograms
  File "D:\AI\sv2tts_china\MockingBird\synthesizer\", line 65, in load
  File "D:\AI\sv2tts_china\MockingBird\synthesizer\models\", line 525, in load
    self.load_state_dict(checkpoint["model_state"], strict=False)
  File "D:\ProgramData\Anaconda3\envs\Real-Time-Voice-Cloning\lib\site-packages\torch\nn\modules\", line 1483, in load_state_dict
    self.__class__.__name__, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for Tacotron:
        size mismatch for encoder_proj.weight: copying a param with shape torch.Size([128, 512]) from checkpoint, the shape in current model is torch.Size([128, 1024]).
        size mismatch for decoder.attn_rnn.weight_ih: copying a param with shape torch.Size([384, 768]) from checkpoint, the shape in current model is torch.Size([384, 1280]).
        size mismatch for decoder.rnn_input.weight: copying a param with shape torch.Size([1024, 640]) from checkpoint, the shape in current model is torch.Size([1024, 1152]).
        size mismatch for decoder.stop_proj.weight: copying a param with shape torch.Size([1, 1536]) from checkpoint, the shape in current model is torch.Size([1, 2048]).

Me too

babysor commented 3 years ago

试着用这个模型: 链接: 提取码:om7f --来自百度网盘超级会员V3的分享

Jackxwb commented 3 years ago

试着用这个模型: 链接: 提取码:om7f --来自百度网盘超级会员V3的分享


Wu-Pretend commented 3 years ago

试着用这个模型: 链接: 提取码:om7f --来自百度网盘超级会员V3的分享

这个模型没问题,把_characters = 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz12340!\'(),-.:;? '改回原来的就行了

System-BXV commented 3 years ago

蔓用这个模型: 链接:https ://提取码:om7f --来自百度网盘超级会员V3的分享


babysor commented 3 years ago

蔓用这个模型: 链接:https ://提取码:om7f --来自百度网盘超级会员V3的分享



babysor commented 3 years ago

ceshi的模型需要将代码切换到10月20号左右的commit之后,再按issue #37 修改之后就可以用了 而作者的模型,需要将代码切换到10月20号左右的commit之后使用

KQDtianxiaK commented 2 years ago

蔓用这个模型: 链接:https ://提取码:om7f --来自百度网盘超级会员V3的分享


跑的步数很少,可以延续跑到100k+ 是不断的点synthesize only之后,输出的声音就会越来越好吗?

zzh666123321 commented 2 years ago

这个是我最近一个修复导致的不兼容问题, 你可以把文件中:synthesizer/utils/ 第11行的内容 改为: _characters = 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz12340!\'(),-.:;? ' 即可。暂时先不要关闭这个issue吧。我看下遇到的人太多的话我做个兼容


Icey-lin commented 2 years ago


Synthesizer using device: cuda
Trainable Parameters: 32.735M
Traceback (most recent call last):
  File "D:\AI\sv2tts_china\MockingBird\toolbox\", line 123, in <lambda>
    func = lambda: self.synthesize() or self.vocode()
  File "D:\AI\sv2tts_china\MockingBird\toolbox\", line 238, in synthesize
    specs = self.synthesizer.synthesize_spectrograms(texts, embeds, style_idx=int(self.ui.style_slider.value()), min_stop_token=min_token)
  File "D:\AI\sv2tts_china\MockingBird\synthesizer\", line 87, in synthesize_spectrograms
  File "D:\AI\sv2tts_china\MockingBird\synthesizer\", line 65, in load
  File "D:\AI\sv2tts_china\MockingBird\synthesizer\models\", line 525, in load
    self.load_state_dict(checkpoint["model_state"], strict=False)
  File "D:\ProgramData\Anaconda3\envs\Real-Time-Voice-Cloning\lib\site-packages\torch\nn\modules\", line 1483, in load_state_dict
    self.__class__.__name__, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for Tacotron:
        size mismatch for encoder_proj.weight: copying a param with shape torch.Size([128, 512]) from checkpoint, the shape in current model is torch.Size([128, 1024]).
        size mismatch for decoder.attn_rnn.weight_ih: copying a param with shape torch.Size([384, 768]) from checkpoint, the shape in current model is torch.Size([384, 1280]).
        size mismatch for decoder.rnn_input.weight: copying a param with shape torch.Size([1024, 640]) from checkpoint, the shape in current model is torch.Size([1024, 1152]).
        size mismatch for decoder.stop_proj.weight: copying a param with shape torch.Size([1, 1536]) from checkpoint, the shape in current model is torch.Size([1, 2048]).


Mr-MoNET commented 2 years ago


Mr-MoNET commented 2 years ago


babysor commented 2 years ago

没有被识别的数据集 不训练的话就不用理会了

Mr-MoNET commented 2 years ago


utmcontent commented 2 years ago


------------------ 原始邮件 ------------------ 发件人: @.>; 发送时间: 2022年1月5日(星期三) 下午3:22 收件人: @.>; 抄送: @.>; @.>; 主题: Re: [babysor/MockingBird] 用这里的模型跑出现这个RuntimeError: Error(s) in loading state_dict for Tacotron: size mismatch for encoder.embedding.weight: copying a param with shape torch.Size([70, 512]) from checkpoint, the shape in current model is torch.Size([75, 512]). (#37)


— Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android. You are receiving this because you commented.Message ID: @.***>

changwei0708 commented 2 years ago

可以直接通过 quickstart (改用该模型,相关代码可以无需修改 ; 环境 3.7.11

tom-uu commented 2 years ago

整篇评论都看了,raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(。生成的都是杂音,代码也照着改了都不行。换模型也不行。。。

tianming937 commented 2 years ago

同样的错误copying a param with shape torch.Size([128, 512]) ,输出的声音全部是杂音

ChunMengXin commented 2 years ago

纯萌新,请教一下切换到tag0.01怎么切换啊?完全没理解。 自己拿75k的训练了一阵目标语音,感觉模仿的声音还是不像,想换这个模型再训练试试

fengxiangyun commented 2 years ago


Synthesizer using device: cuda
Trainable Parameters: 32.735M
Traceback (most recent call last):
  File "D:\AI\sv2tts_china\MockingBird\toolbox\", line 123, in <lambda>
    func = lambda: self.synthesize() or self.vocode()
  File "D:\AI\sv2tts_china\MockingBird\toolbox\", line 238, in synthesize
    specs = self.synthesizer.synthesize_spectrograms(texts, embeds, style_idx=int(self.ui.style_slider.value()), min_stop_token=min_token)
  File "D:\AI\sv2tts_china\MockingBird\synthesizer\", line 87, in synthesize_spectrograms
  File "D:\AI\sv2tts_china\MockingBird\synthesizer\", line 65, in load
  File "D:\AI\sv2tts_china\MockingBird\synthesizer\models\", line 525, in load
    self.load_state_dict(checkpoint["model_state"], strict=False)
  File "D:\ProgramData\Anaconda3\envs\Real-Time-Voice-Cloning\lib\site-packages\torch\nn\modules\", line 1483, in load_state_dict
    self.__class__.__name__, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for Tacotron:
        size mismatch for encoder_proj.weight: copying a param with shape torch.Size([128, 512]) from checkpoint, the shape in current model is torch.Size([128, 1024]).
        size mismatch for decoder.attn_rnn.weight_ih: copying a param with shape torch.Size([384, 768]) from checkpoint, the shape in current model is torch.Size([384, 1280]).
        size mismatch for decoder.rnn_input.weight: copying a param with shape torch.Size([1024, 640]) from checkpoint, the shape in current model is torch.Size([1024, 1152]).
        size mismatch for decoder.stop_proj.weight: copying a param with shape torch.Size([1, 1536]) from checkpoint, the shape in current model is torch.Size([1, 2048]).

同样的报错 你那个好了吗?

Mr-MoNET commented 2 years ago


------------------ 原始邮件 ------------------ 发件人: @.>; 发送时间: 2022年1月24日(星期一) 晚上6:51 收件人: @.>; 抄送: @.>; @.>; 主题: Re: [babysor/MockingBird] 用这里的模型跑出现这个RuntimeError: Error(s) in loading state_dict for Tacotron: size mismatch for encoder.embedding.weight: copying a param with shape torch.Size([70, 512]) from checkpoint, the shape in current model is torch.Size([75, 512]). (#37)

已修改synthesizer/utils/,还是出现报错 Synthesizer using device: cuda Trainable Parameters: 32.735M Traceback (most recent call last): File "D:\AI\sv2tts_china\MockingBird\", line 123, in <lambda> func = lambda: self.synthesize() or self.vocode() File "D:\AI\sv2tts_china\MockingBird\", line 238, in synthesize specs = self.synthesizer.synthesize_spectrograms(texts, embeds, style_idx=int(self.ui.style_slider.value()), min_stop_token=min_token) File "D:\AI\sv2tts_china\MockingBird\synthesizer\", line 87, in synthesize_spectrograms self.load() File "D:\AI\sv2tts_china\MockingBird\synthesizer\", line 65, in load self._model.load(self.model_fpath) File "D:\AI\sv2tts_china\MockingBird\synthesizer\models\", line 525, in load self.load_state_dict(checkpoint["model_state"], strict=False) File "D:\ProgramData\Anaconda3\envs\Real-Time-Voice-Cloning\lib\site-packages\torch\nn\modules\", line 1483, in load_state_dict self.class.name__, "\n\t".join(error_msgs))) RuntimeError: Error(s) in loading state_dict for Tacotron: size mismatch for encoder_proj.weight: copying a param with shape torch.Size([128, 512]) from checkpoint, the shape in current model is torch.Size([128, 1024]). size mismatch for decoder.attn_rnn.weight_ih: copying a param with shape torch.Size([384, 768]) from checkpoint, the shape in current model is torch.Size([384, 1280]). size mismatch for decoder.rnn_input.weight: copying a param with shape torch.Size([1024, 640]) from checkpoint, the shape in current model is torch.Size([1024, 1152]). size mismatch for decoder.stop_proj.weight: copying a param with shape torch.Size([1, 1536]) from checkpoint, the shape in current model is torch.Size([1, 2048]).
同样的报错 你那个好了吗?

— Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android. You are receiving this because you commented.Message ID: @.***>

KasuganoSora-desu commented 2 years ago


babysor commented 2 years ago


Synthesizer using device: cuda
Trainable Parameters: 32.735M
Traceback (most recent call last):
  File "D:\AI\sv2tts_china\MockingBird\toolbox\", line 123, in <lambda>
    func = lambda: self.synthesize() or self.vocode()
  File "D:\AI\sv2tts_china\MockingBird\toolbox\", line 238, in synthesize
    specs = self.synthesizer.synthesize_spectrograms(texts, embeds, style_idx=int(self.ui.style_slider.value()), min_stop_token=min_token)
  File "D:\AI\sv2tts_china\MockingBird\synthesizer\", line 87, in synthesize_spectrograms
  File "D:\AI\sv2tts_china\MockingBird\synthesizer\", line 65, in load
  File "D:\AI\sv2tts_china\MockingBird\synthesizer\models\", line 525, in load
    self.load_state_dict(checkpoint["model_state"], strict=False)
  File "D:\ProgramData\Anaconda3\envs\Real-Time-Voice-Cloning\lib\site-packages\torch\nn\modules\", line 1483, in load_state_dict
    self.__class__.__name__, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for Tacotron:
        size mismatch for encoder_proj.weight: copying a param with shape torch.Size([128, 512]) from checkpoint, the shape in current model is torch.Size([128, 1024]).
        size mismatch for decoder.attn_rnn.weight_ih: copying a param with shape torch.Size([384, 768]) from checkpoint, the shape in current model is torch.Size([384, 1280]).
        size mismatch for decoder.rnn_input.weight: copying a param with shape torch.Size([1024, 640]) from checkpoint, the shape in current model is torch.Size([1024, 1152]).
        size mismatch for decoder.stop_proj.weight: copying a param with shape torch.Size([1, 1536]) from checkpoint, the shape in current model is torch.Size([1, 2048]).

同样的报错 你那个好了吗?
