断句错误和噪音 - Githubissues

morphr5466 commented 4 years ago

您好，我使用清华数据集训练此模型，对齐的效果似乎很好，但是合成的音频存在断句错误和噪音，我不太理解产生这些错误的原因。关于清华数据集，里面的加性噪声很大，您在训练数据集之前是否有进行去噪处理呢？ wav.zip

morphr5466 commented 4 years ago

align_0186_63240 超参数只改动了这些 max_mel_frames=900, # Only relevant when clip_mels_length = True max_text_length=238, # Only relevant when clip_mels_length = True

begeekmyfriend commented 4 years ago

不要再用THCHS-30了，本来是ASR的，标注连标点都没有，请从标贝官网下载开源数据，TTS专属。如果你想要更加自然可以对标注采取分词手段，比如：

我爱北京天安门。
wo3 ai4 bei3jing1 tian1an1men2 .

至于噪声，一般loss下降到0.1差不多，如果你要更清晰，请结合WaveRNN

morphr5466 commented 4 years ago

先生您好，非常感谢您的回复：我会尝试使用标贝数据集训练，再加入您的wavernn，我的模型损失下降的已经很慢了，所以我正在为wavernn准备训练数据，不知您是否建议我这样做，或者您的建议是在tacotron2的损失降到0.1以下再去训练wavernn吗？关于您提到的标注分词手段，我的理解是指在训练数据的标注中就采取这种措施，而不是在未采取分词手段训练的模型中，只在合成语音时把词之间的空格去掉，不知道我的理解是否正确。 ------------------ 原始邮件 ------------------ 发件人: "Leo Ma"<notifications@github.com>; 发送时间: 2020年2月19日(星期三) 晚上8:04 收件人: "begeekmyfriend/tacotron2"<tacotron2@noreply.github.com>; 抄送: "623265652"<623265652@qq.com>;"Author"<author@noreply.github.com>; 主题: Re: [begeekmyfriend/tacotron2] 断句错误和噪音 (#12)

不要再用THCHS-30了，本来是ASR的，标注连标点都没有，请从标贝官网下载开源数据，TTS专属。如果你想要更加自然可以对标注采取分词手段，比如：我爱北京天安门。 wo3 ai4 bei3jing1 tian1an1men2 .

至于噪声，一般loss下降到0.1差不多，如果你要更清晰，请结合WaveRNN

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.

begeekmyfriend commented 4 years ago

在拼音标注中进行分词效果更好。你可以通过script/griffin_lim_synth.sh来验证效果。

至于loss不强求具体数值，收敛即可，准备wavernn数据可以用preprocess.py输出的audio目录下的数据拷贝到quant目录，再用script/gta_synth.sh生成的mel拷贝到gta目录，具体做法详见dataset.py

另外，tacotron2和wavernn昨天都有过更新

morphr5466 commented 4 years ago

先生您好：感谢您的耐心讲解，我使用您在的代码训练wavernn，但是我遇到了一些问题，似乎与fatchord的issue中提到的问题类似，但他没有提到是如何修改的。如果设置wavernn训练模式为“RAW”，会出现cudnn报错

我尝试不断调大bits，由9逐渐增加到16，直到内存不足，也未能解决overrun

目前我正在使用MOL模式进行训练，不会遇到这个问题。另外，我的环境配置是这样的： ubuntu 16.04 python3.6 torch 1.0.0 CUDA Version 9.0.176 cudnn 7.5.0 ------------------ 原始邮件 ------------------ 发件人: "Leo Ma"<notifications@github.com>; 发送时间: 2020年2月20日(星期四) 晚上6:17 收件人: "begeekmyfriend/tacotron2"<tacotron2@noreply.github.com>; 抄送: "623265652"<623265652@qq.com>;"Author"<author@noreply.github.com>; 主题: Re: [begeekmyfriend/tacotron2] 断句错误和噪音 (#12)

在拼音标注中进行分词效果更好。

至于loss不强求具体数值，收敛即可，准备wavernn数据可以用preprocess.py输出的audio目录下的数据拷贝到quant目录，再用script/gta_synth.sh生成的mel拷贝到gta目录，具体做法详见dataset.py

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.

begeekmyfriend commented 4 years ago

WaveRNN 9bit足够了，请用pip安装apex

git clone https://github.com/NVIDIA/apex
cd apex
pip install -v --no-cache-dir --global-option="--pyprof" --global-option="--cpp_ext" --global-option="--cuda_ext" ./

begeekmyfriend commented 4 years ago

对了，从你2L的图来看，才63K迭代，建议至少100K~150K迭代为好，参见Tacotron-2

begeekmyfriend commented 4 years ago

https://github.com/begeekmyfriend/tacotron2/issues/11#issuecomment-590772704

morphr5466 commented 4 years ago

先生，我使用标贝数据集训练的结果如下，音频如eval5。

正在尝试训练wavernn，但结果似乎有点问题：所有的音频，包括target和gen_batched_target的波形都是这样的，这似乎不太正常。

训练数据quant是tacotron2预处理后的audio文件夹，gta是gta_synth.sh生成的，训练模式MOL，其余参数都保持一致。

另外，我认为不能使用raw模式训练是预训练数据导致的，使用LJspeech-1重新预处理数据（超参数设置为raw），是可以使用raw进行训练的。

begeekmyfriend commented 4 years ago

我训练的是raw模式，mol代码没有清除掉。

morphr5466 commented 4 years ago

感谢您的快速回复，请问您是怎么预处理数据的呢？如果用fatchord的预处理程序重新生成，再拿回来训练wavernn这样可行吗

begeekmyfriend commented 4 years ago

你没看到Tacotron-2``tacotron2以及WaveRNN预处理代码是一致的吗？

begeekmyfriend commented 4 years ago

4 evaluation examples with multi-speaker have been provided and please feel free to reopen this issue t2_wavernn_eval.zip

morphr5466 commented 4 years ago

您好，请问您尝试过迁移学习吗，在训练时，哪些参数需要打开呢？

begeekmyfriend commented 4 years ago

迁移学习不属于本项目范畴，训练等见README

bash scripts/train_tacotron2.sh

morphr5466 commented 4 years ago

先生，我按照您的方法，正在使用标贝数据集训练wavernn，训练过程中总会出现梯度爆炸并跳过step，我不知道是什么原因造成的，这是否会对模型有影响呢？似乎tacotron2的结果还不错。

begeekmyfriend commented 4 years ago

WaveRNN梯度爆炸可能是混合精度训练的实现问题，模型给出几个警告会自动调整梯度大小，如果loss最终走向正常衰减就没必要关注它。

begeekmyfriend commented 4 years ago

另外，WaveRNN的mel padding确保是-12，请关注版本。

morphr5466 commented 4 years ago

您指的是hp.voc_pad_val=-12吗，我不太理解它的含义

begeekmyfriend commented 4 years ago

https://github.com/begeekmyfriend/tacotron2/blob/master/tacotron2/loader.py#L43

begeekmyfriend / tacotron2

断句错误和噪音 #12