babysor / MockingBird

🚀AI拟声: 5秒内克隆您的声音并生成任意语音内容 Clone a voice in 5 seconds to generate arbitrary speech in real-time
Other
34.62k stars 5.16k forks source link

用这里的模型跑出现这个RuntimeError: Error(s) in loading state_dict for Tacotron: size mismatch for encoder.embedding.weight: copying a param with shape torch.Size([70, 512]) from checkpoint, the shape in current model is torch.Size([75, 512]). #37

Closed wangkewk closed 2 years ago

wangkewk commented 2 years ago

谁能解决

wangkewk commented 2 years ago

image

babysor commented 2 years ago

这个是我最近一个修复导致的不兼容问题, 你可以把文件中:synthesizer/utils/symbols.py 第11行的内容 改为: _characters = 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz12340!\'(),-.:;? ' 即可。暂时先不要关闭这个issue吧。我看下遇到的人太多的话我做个兼容

wangkewk commented 2 years ago

这个是我最近一个修复导致的不兼容问题, 你可以把文件中:synthesizer/utils/symbols.py 第11行的symbols 改为: _characters = 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz12340!\'(),-.:;? ' 即可。暂时先不要关闭这个issue吧。我看下遇到的人太多的话我做个兼容

放心

soft-di commented 2 years ago

同样的问题!

FuryMartin commented 2 years ago

这个是我最近一个修复导致的不兼容问题, 你可以把文件中:synthesizer/utils/symbols.py 第11行的symbols 改为: _characters = 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz12340!\'(),-.:;? ' 即可。暂时先不要关闭这个issue吧。我看下遇到的人太多的话我做个兼容

谢谢,这是有效的。修改过之后,原来的纯杂音变成正常声音了

ALSYLY commented 2 years ago

这个是我最近一个修复导致的不兼容问题, 你可以把文件中:synthesizer/utils/symbols.py 第11行的symbols 改为: _characters = 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz12340!\'(),-.:;? ' 即可。暂时先不要关闭这个issue吧。我看下遇到的人太多的话我做个兼容

谢谢,问题解决了

sanhuafeiluo commented 2 years ago

这个是我最近一个修复导致的不兼容问题, 你可以把文件中:synthesizer/utils/symbols.py 第11行的symbols 改为: _characters = 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz12340!\'(),-.:;? ' 即可。暂时先不要关闭这个issue吧。我看下遇到的人太多的话我做个兼容

谢谢,已经解决

vc815 commented 2 years ago

同样问题

duolanda commented 2 years ago

这个是我最近一个修复导致的不兼容问题, 你可以把文件中:synthesizer/utils/symbols.py 第11行的symbols 改为: _characters = 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz12340!\'(),-.:;? ' 即可。暂时先不要关闭这个issue吧。我看下遇到的人太多的话我做个兼容

感谢!问题已顺利解决。

zhangxiaozhier commented 2 years ago

一样!

yukikawas commented 2 years ago

修改后完全正常,thanks~

diyanqi commented 2 years ago

+1

skygongque commented 2 years ago

修改后正常了,感谢

Puwong commented 2 years ago

问题确实解决了,但是声音质量没有哔哩哔哩的效果好,我特意找到的小说的录音,不知道是哪里有问题。 如果想要声音特别像某个人的声音,要怎么提高呢?

betsyalan commented 2 years ago

同样的问题。

utmcontent commented 2 years ago

这个是我最近一个修复导致的不兼容问题, 你可以把文件中:synthesizer/utils/symbols.py 第11行的symbols 改为: _characters = 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz12340!\'(),-.:;? ' 即可。暂时先不要关闭这个issue吧。我看下遇到的人太多的话我做个兼容

如果使用自己训练的模型 要把这个改回去才有效吗 还是不用改也行 我试了下新训练的没声音(也有可能是自己训练的问题)但是用给的模型是正常

babysor commented 2 years ago

这个是我最近一个修复导致的不兼容问题, 你可以把文件中:synthesizer/utils/symbols.py 第11行的symbols 改为: _characters = 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz12340!\'(),-.:;? ' 即可。暂时先不要关闭这个issue吧。我看下遇到的人太多的话我做个兼容

如果使用自己训练的模型 要把这个改回去才有效吗 还是不用改也行 我试了下新训练的没声音(也有可能是自己训练的问题)但是用给的模型是正常

改回去效果会好一点 但是不改也可以工作的

JeffCheung85 commented 2 years ago

总算可以了,这个问题搞了好久,还以为本地安装的环境问题

xugaoxiang commented 2 years ago

+1

QiYYZH commented 2 years ago

出来的声音像机器人的声音,是因为不同的电脑环境出来的效果不一样么?那是否得自己重新训练模型?

babysor commented 2 years ago

出来的声音像机器人的声音,是因为不同的电脑环境出来的效果不一样么?那是否得自己重新训练模型?

不是的,可能是vocoder或者输入音频不同导致的

wa008 commented 2 years ago

+1

chenyv118 commented 2 years ago

唉,还是没有视频中的效果,听起来像刚来中国的老外的塑料中文

chenyv118 commented 2 years ago

问题确实解决了,但是声音质量没有哔哩哔哩的效果好,我特意找到的小说的录音,不知道是哪里有问题。 如果想要声音特别像某个人的声音,要怎么提高呢?

我也用的B站up主的模型,但是没有bilibili中的效果,我那边听起来像伏拉夫的调调,都不像中文

babysor commented 2 years ago

问题确实解决了,但是声音质量没有哔哩哔哩的效果好,我特意找到的小说的录音,不知道是哪里有问题。 如果想要声音特别像某个人的声音,要怎么提高呢?

我也用的B站up主的模型,但是没有bilibili中的效果,我那边听起来像伏拉夫的调调,都不像中文

如果录音清晰,平调情况下音色复制效果还是可以的,是不是哪里没运行好?

Jackxwb commented 2 years ago

已修改synthesizer/utils/symbols.py,还是出现报错

Synthesizer using device: cuda
Trainable Parameters: 32.735M
Traceback (most recent call last):
  File "D:\AI\sv2tts_china\MockingBird\toolbox\__init__.py", line 123, in <lambda>
    func = lambda: self.synthesize() or self.vocode()
  File "D:\AI\sv2tts_china\MockingBird\toolbox\__init__.py", line 238, in synthesize
    specs = self.synthesizer.synthesize_spectrograms(texts, embeds, style_idx=int(self.ui.style_slider.value()), min_stop_token=min_token)
  File "D:\AI\sv2tts_china\MockingBird\synthesizer\inference.py", line 87, in synthesize_spectrograms
    self.load()
  File "D:\AI\sv2tts_china\MockingBird\synthesizer\inference.py", line 65, in load
    self._model.load(self.model_fpath)
  File "D:\AI\sv2tts_china\MockingBird\synthesizer\models\tacotron.py", line 525, in load
    self.load_state_dict(checkpoint["model_state"], strict=False)
  File "D:\ProgramData\Anaconda3\envs\Real-Time-Voice-Cloning\lib\site-packages\torch\nn\modules\module.py", line 1483, in load_state_dict
    self.__class__.__name__, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for Tacotron:
        size mismatch for encoder_proj.weight: copying a param with shape torch.Size([128, 512]) from checkpoint, the shape in current model is torch.Size([128, 1024]).
        size mismatch for decoder.attn_rnn.weight_ih: copying a param with shape torch.Size([384, 768]) from checkpoint, the shape in current model is torch.Size([384, 1280]).
        size mismatch for decoder.rnn_input.weight: copying a param with shape torch.Size([1024, 640]) from checkpoint, the shape in current model is torch.Size([1024, 1152]).
        size mismatch for decoder.stop_proj.weight: copying a param with shape torch.Size([1, 1536]) from checkpoint, the shape in current model is torch.Size([1, 2048]).
Wu-Pretend commented 2 years ago

已修改synthesizer/utils/symbols.py,还是出现报错

Synthesizer using device: cuda
Trainable Parameters: 32.735M
Traceback (most recent call last):
  File "D:\AI\sv2tts_china\MockingBird\toolbox\__init__.py", line 123, in <lambda>
    func = lambda: self.synthesize() or self.vocode()
  File "D:\AI\sv2tts_china\MockingBird\toolbox\__init__.py", line 238, in synthesize
    specs = self.synthesizer.synthesize_spectrograms(texts, embeds, style_idx=int(self.ui.style_slider.value()), min_stop_token=min_token)
  File "D:\AI\sv2tts_china\MockingBird\synthesizer\inference.py", line 87, in synthesize_spectrograms
    self.load()
  File "D:\AI\sv2tts_china\MockingBird\synthesizer\inference.py", line 65, in load
    self._model.load(self.model_fpath)
  File "D:\AI\sv2tts_china\MockingBird\synthesizer\models\tacotron.py", line 525, in load
    self.load_state_dict(checkpoint["model_state"], strict=False)
  File "D:\ProgramData\Anaconda3\envs\Real-Time-Voice-Cloning\lib\site-packages\torch\nn\modules\module.py", line 1483, in load_state_dict
    self.__class__.__name__, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for Tacotron:
        size mismatch for encoder_proj.weight: copying a param with shape torch.Size([128, 512]) from checkpoint, the shape in current model is torch.Size([128, 1024]).
        size mismatch for decoder.attn_rnn.weight_ih: copying a param with shape torch.Size([384, 768]) from checkpoint, the shape in current model is torch.Size([384, 1280]).
        size mismatch for decoder.rnn_input.weight: copying a param with shape torch.Size([1024, 640]) from checkpoint, the shape in current model is torch.Size([1024, 1152]).
        size mismatch for decoder.stop_proj.weight: copying a param with shape torch.Size([1, 1536]) from checkpoint, the shape in current model is torch.Size([1, 2048]).

Me too

babysor commented 2 years ago

试着用这个模型: 链接:https://pan.baidu.com/s/1fMh9IlgKJlL2PIiRTYDUvw 提取码:om7f --来自百度网盘超级会员V3的分享

Jackxwb commented 2 years ago

试着用这个模型: 链接:https://pan.baidu.com/s/1fMh9IlgKJlL2PIiRTYDUvw 提取码:om7f --来自百度网盘超级会员V3的分享

可以运行起来了,但是生成的句子只有前半是读出来的,后半句都是杂音,多生成几次有时会好点有时又会倒退回去,而且生成的声音和原音频不像,差的有点远的那种,哈哈

Wu-Pretend commented 2 years ago

试着用这个模型: 链接:https://pan.baidu.com/s/1fMh9IlgKJlL2PIiRTYDUvw 提取码:om7f --来自百度网盘超级会员V3的分享

这个模型没问题,把_characters = 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz12340!\'(),-.:;? '改回原来的就行了

System-BXV commented 2 years ago

蔓用这个模型: 链接:https ://pan.baidu.com/s/1fMh9IlgKJlL2PIiRTYDUvw提取码:om7f --来自百度网盘超级会员V3的分享

这个可以解决了,但拿演示音频测试,生成的差了好多emmm

babysor commented 2 years ago

蔓用这个模型: 链接:https ://pan.baidu.com/s/1fMh9IlgKJlL2PIiRTYDUvw提取码:om7f --来自百度网盘超级会员V3的分享

这个可以解决了,但拿演示音频测试,生成的差了好多emmm

跑的步数很少,可以延续跑到100k+

babysor commented 2 years ago

ceshi的模型需要将代码切换到10月20号左右的commit之后,再按issue #37 修改之后就可以用了 而作者的模型,需要将代码切换到10月20号左右的commit之后使用

KQDtianxiaK commented 2 years ago

蔓用这个模型: 链接:https ://pan.baidu.com/s/1fMh9IlgKJlL2PIiRTYDUvw提取码:om7f --来自百度网盘超级会员V3的分享

这个可以解决了,但拿演示音频测试,生成的差了好多emmm

跑的步数很少,可以延续跑到100k+ 是不断的点synthesize only之后,输出的声音就会越来越好吗?

zzh666123321 commented 2 years ago

这个是我最近一个修复导致的不兼容问题, 你可以把文件中:synthesizer/utils/symbols.py 第11行的内容 改为: _characters = 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz12340!\'(),-.:;? ' 即可。暂时先不要关闭这个issue吧。我看下遇到的人太多的话我做个兼容

改了之后还是没用,,,,希望再看看

Icey-lin commented 2 years ago

已修改synthesizer/utils/symbols.py,还是出现报错

Synthesizer using device: cuda
Trainable Parameters: 32.735M
Traceback (most recent call last):
  File "D:\AI\sv2tts_china\MockingBird\toolbox\__init__.py", line 123, in <lambda>
    func = lambda: self.synthesize() or self.vocode()
  File "D:\AI\sv2tts_china\MockingBird\toolbox\__init__.py", line 238, in synthesize
    specs = self.synthesizer.synthesize_spectrograms(texts, embeds, style_idx=int(self.ui.style_slider.value()), min_stop_token=min_token)
  File "D:\AI\sv2tts_china\MockingBird\synthesizer\inference.py", line 87, in synthesize_spectrograms
    self.load()
  File "D:\AI\sv2tts_china\MockingBird\synthesizer\inference.py", line 65, in load
    self._model.load(self.model_fpath)
  File "D:\AI\sv2tts_china\MockingBird\synthesizer\models\tacotron.py", line 525, in load
    self.load_state_dict(checkpoint["model_state"], strict=False)
  File "D:\ProgramData\Anaconda3\envs\Real-Time-Voice-Cloning\lib\site-packages\torch\nn\modules\module.py", line 1483, in load_state_dict
    self.__class__.__name__, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for Tacotron:
        size mismatch for encoder_proj.weight: copying a param with shape torch.Size([128, 512]) from checkpoint, the shape in current model is torch.Size([128, 1024]).
        size mismatch for decoder.attn_rnn.weight_ih: copying a param with shape torch.Size([384, 768]) from checkpoint, the shape in current model is torch.Size([384, 1280]).
        size mismatch for decoder.rnn_input.weight: copying a param with shape torch.Size([1024, 640]) from checkpoint, the shape in current model is torch.Size([1024, 1152]).
        size mismatch for decoder.stop_proj.weight: copying a param with shape torch.Size([1, 1536]) from checkpoint, the shape in current model is torch.Size([1, 2048]).

我觉得你这个估计是一开始你复制了模型到你的程序里面去了,重新解压一下那个程序的压缩包,然后重新来就可以了

Mr-MoNET commented 2 years ago

为什么我的源音频是黑色的,有大佬知道吗?

Mr-MoNET commented 2 years ago

源音频的Dataset和Speaker这些都是黑的,不能选择?

babysor commented 2 years ago

没有被识别的数据集 不训练的话就不用理会了

Mr-MoNET commented 2 years ago

大佬,是不是如果要克隆自己的声音的话,需要对自己做音源进行训练,而不能直接用community给的那些模型。昨天用给的模型(包括synthesizer和vector)克隆自己的录音,结果出来的梅尔频谱图是杂乱的,只有一堆电流声和噪声,求大佬指正错误

utmcontent commented 2 years ago

我也一样啊.程序逻辑有声波频率上的错误.

------------------ 原始邮件 ------------------ 发件人: @.>; 发送时间: 2022年1月5日(星期三) 下午3:22 收件人: @.>; 抄送: @.>; @.>; 主题: Re: [babysor/MockingBird] 用这里的模型跑出现这个RuntimeError: Error(s) in loading state_dict for Tacotron: size mismatch for encoder.embedding.weight: copying a param with shape torch.Size([70, 512]) from checkpoint, the shape in current model is torch.Size([75, 512]). (#37)

大佬,是不是如果要克隆自己的声音的话,需要对自己做音源进行训练,而不能直接用community给的那些模型。昨天用给的模型(包括synthesizer和vector)克隆自己的录音,结果出来的梅尔频谱图是杂乱的,只有一堆电流声和噪声,求大佬指正错误

— Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android. You are receiving this because you commented.Message ID: @.***>

changwei0708 commented 2 years ago

可以直接通过 quickstart (https://github.com/babysor/MockingBird/wiki/Quick-Start-(Newbie))改用该模型,相关代码可以无需修改 ; 环境 3.7.11

tom-uu commented 2 years ago

整篇评论都看了,raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(。生成的都是杂音,代码也照着改了都不行。换模型也不行。。。

tianming937 commented 2 years ago

同样的错误copying a param with shape torch.Size([128, 512]) ,输出的声音全部是杂音

ChunMengXin commented 2 years ago

纯萌新,请教一下切换到tag0.01怎么切换啊?完全没理解。 自己拿75k的训练了一阵目标语音,感觉模仿的声音还是不像,想换这个模型再训练试试

fengxiangyun commented 2 years ago

已修改synthesizer/utils/symbols.py,还是出现报错

Synthesizer using device: cuda
Trainable Parameters: 32.735M
Traceback (most recent call last):
  File "D:\AI\sv2tts_china\MockingBird\toolbox\__init__.py", line 123, in <lambda>
    func = lambda: self.synthesize() or self.vocode()
  File "D:\AI\sv2tts_china\MockingBird\toolbox\__init__.py", line 238, in synthesize
    specs = self.synthesizer.synthesize_spectrograms(texts, embeds, style_idx=int(self.ui.style_slider.value()), min_stop_token=min_token)
  File "D:\AI\sv2tts_china\MockingBird\synthesizer\inference.py", line 87, in synthesize_spectrograms
    self.load()
  File "D:\AI\sv2tts_china\MockingBird\synthesizer\inference.py", line 65, in load
    self._model.load(self.model_fpath)
  File "D:\AI\sv2tts_china\MockingBird\synthesizer\models\tacotron.py", line 525, in load
    self.load_state_dict(checkpoint["model_state"], strict=False)
  File "D:\ProgramData\Anaconda3\envs\Real-Time-Voice-Cloning\lib\site-packages\torch\nn\modules\module.py", line 1483, in load_state_dict
    self.__class__.__name__, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for Tacotron:
        size mismatch for encoder_proj.weight: copying a param with shape torch.Size([128, 512]) from checkpoint, the shape in current model is torch.Size([128, 1024]).
        size mismatch for decoder.attn_rnn.weight_ih: copying a param with shape torch.Size([384, 768]) from checkpoint, the shape in current model is torch.Size([384, 1280]).
        size mismatch for decoder.rnn_input.weight: copying a param with shape torch.Size([1024, 640]) from checkpoint, the shape in current model is torch.Size([1024, 1152]).
        size mismatch for decoder.stop_proj.weight: copying a param with shape torch.Size([1, 1536]) from checkpoint, the shape in current model is torch.Size([1, 2048]).

同样的报错 你那个好了吗?

Mr-MoNET commented 2 years ago

还没xd

------------------ 原始邮件 ------------------ 发件人: @.>; 发送时间: 2022年1月24日(星期一) 晚上6:51 收件人: @.>; 抄送: @.>; @.>; 主题: Re: [babysor/MockingBird] 用这里的模型跑出现这个RuntimeError: Error(s) in loading state_dict for Tacotron: size mismatch for encoder.embedding.weight: copying a param with shape torch.Size([70, 512]) from checkpoint, the shape in current model is torch.Size([75, 512]). (#37)

已修改synthesizer/utils/symbols.py,还是出现报错 Synthesizer using device: cuda Trainable Parameters: 32.735M Traceback (most recent call last): File "D:\AI\sv2tts_china\MockingBird\toolbox__init.py", line 123, in <lambda> func = lambda: self.synthesize() or self.vocode() File "D:\AI\sv2tts_china\MockingBird\toolbox__init__.py", line 238, in synthesize specs = self.synthesizer.synthesize_spectrograms(texts, embeds, style_idx=int(self.ui.style_slider.value()), min_stop_token=min_token) File "D:\AI\sv2tts_china\MockingBird\synthesizer\inference.py", line 87, in synthesize_spectrograms self.load() File "D:\AI\sv2tts_china\MockingBird\synthesizer\inference.py", line 65, in load self._model.load(self.model_fpath) File "D:\AI\sv2tts_china\MockingBird\synthesizer\models\tacotron.py", line 525, in load self.load_state_dict(checkpoint["model_state"], strict=False) File "D:\ProgramData\Anaconda3\envs\Real-Time-Voice-Cloning\lib\site-packages\torch\nn\modules\module.py", line 1483, in load_state_dict self.class.name__, "\n\t".join(error_msgs))) RuntimeError: Error(s) in loading state_dict for Tacotron: size mismatch for encoder_proj.weight: copying a param with shape torch.Size([128, 512]) from checkpoint, the shape in current model is torch.Size([128, 1024]). size mismatch for decoder.attn_rnn.weight_ih: copying a param with shape torch.Size([384, 768]) from checkpoint, the shape in current model is torch.Size([384, 1280]). size mismatch for decoder.rnn_input.weight: copying a param with shape torch.Size([1024, 640]) from checkpoint, the shape in current model is torch.Size([1024, 1152]). size mismatch for decoder.stop_proj.weight: copying a param with shape torch.Size([1, 1536]) from checkpoint, the shape in current model is torch.Size([1, 2048]).
同样的报错 你那个好了吗?

— Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android. You are receiving this because you commented.Message ID: @.***>

KasuganoSora-desu commented 2 years ago

只输出杂音,按照评论来改了还是一样

babysor commented 2 years ago

已修改synthesizer/utils/symbols.py,还是出现报错

Synthesizer using device: cuda
Trainable Parameters: 32.735M
Traceback (most recent call last):
  File "D:\AI\sv2tts_china\MockingBird\toolbox\__init__.py", line 123, in <lambda>
    func = lambda: self.synthesize() or self.vocode()
  File "D:\AI\sv2tts_china\MockingBird\toolbox\__init__.py", line 238, in synthesize
    specs = self.synthesizer.synthesize_spectrograms(texts, embeds, style_idx=int(self.ui.style_slider.value()), min_stop_token=min_token)
  File "D:\AI\sv2tts_china\MockingBird\synthesizer\inference.py", line 87, in synthesize_spectrograms
    self.load()
  File "D:\AI\sv2tts_china\MockingBird\synthesizer\inference.py", line 65, in load
    self._model.load(self.model_fpath)
  File "D:\AI\sv2tts_china\MockingBird\synthesizer\models\tacotron.py", line 525, in load
    self.load_state_dict(checkpoint["model_state"], strict=False)
  File "D:\ProgramData\Anaconda3\envs\Real-Time-Voice-Cloning\lib\site-packages\torch\nn\modules\module.py", line 1483, in load_state_dict
    self.__class__.__name__, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for Tacotron:
        size mismatch for encoder_proj.weight: copying a param with shape torch.Size([128, 512]) from checkpoint, the shape in current model is torch.Size([128, 1024]).
        size mismatch for decoder.attn_rnn.weight_ih: copying a param with shape torch.Size([384, 768]) from checkpoint, the shape in current model is torch.Size([384, 1280]).
        size mismatch for decoder.rnn_input.weight: copying a param with shape torch.Size([1024, 640]) from checkpoint, the shape in current model is torch.Size([1024, 1152]).
        size mismatch for decoder.stop_proj.weight: copying a param with shape torch.Size([1, 1536]) from checkpoint, the shape in current model is torch.Size([1, 2048]).

同样的报错 你那个好了吗?

版本先切换,再应用#37