JasonWei512 / Tacotron-2-Chinese

(已过时)中文语音合成,改自 https://github.com/Rayhane-mamah/Tacotron-2 和 https://github.com/begeekmyfriend/Tacotron-2
MIT License
299 stars 70 forks source link

WaveNet 进展 #4

Open JasonWei512 opened 4 years ago

JasonWei512 commented 4 years ago

wave_pretrained.z01.pptx wave_pretrained.z02.pptx wave_pretrained.z03.pptx wave_pretrained.z04.pptx wave_pretrained.zip.pptx 把.pptx去掉,解压

Griffin-Lim vs WaveNet.zip

JasonWei512 commented 4 years ago

效果不行,试试山本隆一的实现: https://github.com/JasonWei512/wavenet_vocoder 先试试直接用 Rayhane-mamah 的 Tacotron-2 输出的 GTA 训练

先例:https://github.com/Rayhane-mamah/Tacotron-2/issues/215 立陶宛语 ground truth fine-tune LJSpeech 模型, 效果很好

JasonWei512 commented 4 years ago

WaveNet.zip 300K步 在 Windows 下读不了超参 json,evaluate 时也只能生成最开始一小段

JasonWei512 commented 4 years ago

310K result.zip

JasonWei512 commented 4 years ago

49万步 怒斥 2070 Super 上推理速度为 30~40 it/s,实时率千分之一

ly1984 commented 4 years ago

Wavenet是用LJSpeech数据集训练的吗?

JasonWei512 commented 4 years ago

Wavenet是用LJSpeech数据集训练的吗?

用的标贝的数据集

wqt2019 commented 4 years ago

大佬,wave_pretrained能重新上传下吗,下载解压失败

JasonWei512 commented 4 years ago

大佬,wave_pretrained能重新上传下吗,下载解压失败

五个包下载到同一目录,都去掉 .pptx 然后解压。

nmfisher commented 4 years ago

包里的hparams.py是正确吗?进行synthesize.py遇到错误:

Traceback (most recent call last): File "synthesize.py", line 100, in main() File "synthesize.py", line 92, in main wavenet_synthesize(args, hparams, wave_checkpoint) File "/mnt/e/projects/Tacotron-2-Chinese/wavenet_vocoder/synthesize.py", line 78, in wavenet_synthesize run_synthesis(args, checkpoint_path, output_dir, hparams) File "/mnt/e/projects/Tacotron-2-Chinese/wavenet_vocoder/synthesize.py", line 19, in run_synthesis synth.load(checkpoint_path, hparams) File "/mnt/e/projects/Tacotron-2-Chinese/wavenet_vocoder/synthesizer.py", line 28, in load self.model = create_model(model_name, hparams) File "/mnt/e/projects/Tacotron-2-Chinese/wavenet_vocoder/models/init.py", line 12, in create_model return WaveNet(hparams, init) File "/mnt/e/projects/Tacotron-2-Chinese/wavenet_vocoder/models/wavenet.py", line 192, in init up_layers=len(hparams.upsample_scales), name='SubPixelConvolutionlayer{}'.format(i)) File "/mnt/e/projects/Tacotron-2-Chinese/wavenet_vocoder/models/modules.py", line 553, in init init_kernel = tf.constant_initializer(self._init_kernel(kernel_size, strides, conv_filters), dtype=tf.float32) if NN_init else None File "/mnt/e/projects/Tacotron-2-Chinese/wavenet_vocoder/models/modules.py", line 653, in _init_kernel init_kernel = np.tile(np.expand_dims(init_kernel, 3), [1, 1, 1, filters]) File "<__array_function__ internals>", line 6, in expand_dims File "/home/hydroxide/.local/lib/python3.6/site-packages/numpy/lib/shape_base.py", line 597, in expand_dims axis = normalize_axis_tuple(axis, out_ndim) File "/home/hydroxide/.local/lib/python3.6/site-packages/numpy/core/numeric.py", line 1327, in normalize_axis_tuple axis = tuple([normalize_axis_index(ax, ndim, argname) for ax in axis]) File "/home/hydroxide/.local/lib/python3.6/site-packages/numpy/core/numeric.py", line 1327, in axis = tuple([normalize_axis_index(ax, ndim, argname) for ax in axis]) numpy.AxisError: axis 3 is out of bounds for array of dimension 3

JasonWei512 commented 4 years ago

@nmfisher 我看了下,参数应该是对的

ly1984 commented 4 years ago

我用tacotron生成wav瞬间,用wavenet需要半小时,这是什么原因?显卡1080ti.

JasonWei512 commented 4 years ago

我用tacotron生成wav瞬间,用wavenet需要半小时,这是什么原因?显卡1080ti.

原版自回归 WaveNet 就这么慢的。生成的波形里每个采样点的预测值都依赖于该点之前 505 个采样点的值,一秒语音要按顺序预测 36000 次,不可以并行。

JasonWei512 commented 4 years ago

I implemented mixture of logistic distributions loss as well as exponential model averaging in #5. According to the Parallel WaveNet paper, exponential model averaging is important for quality.

One difference would be training time. I did finetune the model many times. i.e., train 200k steps -> (change some hyper param and let's see how it works) -> train 200k step (lr starts from initial value) -> ... repeated. This might lead faster convergence.

If I remember correctly I trained the model for over 1000k steps In total.

https://github.com/r9y9/wavenet_vocoder/issues/1#issuecomment-361130247

Hunkshang commented 4 years ago

包里的hparams.py是正确吗?进行synthesize.py遇到错误:

Traceback (most recent call last): File "synthesize.py", line 100, in main() File "synthesize.py", line 92, in main wavenet_synthesize(args, hparams, wave_checkpoint) File "/mnt/e/projects/Tacotron-2-Chinese/wavenet_vocoder/synthesize.py", line 78, in wavenet_synthesize run_synthesis(args, checkpoint_path, output_dir, hparams) File "/mnt/e/projects/Tacotron-2-Chinese/wavenet_vocoder/synthesize.py", line 19, in run_synthesis synth.load(checkpoint_path, hparams) File "/mnt/e/projects/Tacotron-2-Chinese/wavenet_vocoder/synthesizer.py", line 28, in load self.model = create_model(model_name, hparams) File "/mnt/e/projects/Tacotron-2-Chinese/wavenet_vocoder/models/init.py", line 12, in create_model return WaveNet(hparams, init) File "/mnt/e/projects/Tacotron-2-Chinese/wavenet_vocoder/models/wavenet.py", line 192, in init up_layers=len(hparams.upsample_scales), name='SubPixelConvolutionlayer{}'.format(i)) File "/mnt/e/projects/Tacotron-2-Chinese/wavenet_vocoder/models/modules.py", line 553, in init init_kernel = tf.constant_initializer(self._init_kernel(kernel_size, strides, conv_filters), dtype=tf.float32) if NN_init else None File "/mnt/e/projects/Tacotron-2-Chinese/wavenet_vocoder/models/modules.py", line 653, in _init_kernel init_kernel = np.tile(np.expand_dims(init_kernel, 3), [1, 1, 1, filters]) File "<array_function internals>", line 6, in expand_dims File "/home/hydroxide/.local/lib/python3.6/site-packages/numpy/lib/shape_base.py", line 597, in expand_dims axis = normalize_axis_tuple(axis, out_ndim) File "/home/hydroxide/.local/lib/python3.6/site-packages/numpy/core/numeric.py", line 1327, in normalize_axis_tuple axis = tuple([normalize_axis_index(ax, ndim, argname) for ax in axis]) File "/home/hydroxide/.local/lib/python3.6/site-packages/numpy/core/numeric.py", line 1327, in axis = tuple([normalize_axis_index(ax, ndim, argname) for ax in axis]) numpy.AxisError: axis 3 is out of bounds for array of dimension 3

@nmfisher 这个问题你解决了嘛?我也遇到同样的问题

WYan123 commented 4 years ago

作者,您好。您能把训练好的wavenet模型分享一下么,时间紧,训练久,毕业在即。哈哈,因为wavenet模型不修改

JasonWei512 commented 4 years ago

作者,您好。您能把训练好的wavenet模型分享一下么,时间紧,训练久,毕业在即。哈哈,因为wavenet模型不修改

把一楼那五个wave_pretrained解压

WYan123 commented 4 years ago

不,作者。我的意思是您有训练好生成的wavenet模型么?而非生成的语音效果,我tacotron2训练了,Wavenet还没预训练

JasonWei512 commented 4 years ago

不,作者。我的意思是您有训练好生成的wavenet模型么?而非生成的语音效果,我tacotron2训练了,Wavenet还没预训练

image

CarolinGao commented 4 years ago

不,作者。我的意思是您有训练好生成的wavenet模型么?而非生成的语音效果,我tacotron2训练了,Wavenet还没预训练

请问你试了吗?效果怎么样?

gdineshk6174 commented 4 years ago

@nmfisher @Hunkshang hello, did u get solution to the "numpy.AxisError: axis 3 is out of bounds for array of dimension 3" problem. if so would you kindly share. thank you

gaoyu1983 commented 4 years ago

@nmfisher @Hunkshang hello, did u get solution to the "numpy.AxisError: axis 3 is out of bounds for array of dimension 3" problem. if so would you kindly share. thank you

I got the same wrong message when training wavenet. Have you solved yet?

gaoyu1983 commented 4 years ago

魏老师,看了下这个帖子里不少同学都出现同样的报错 numpy.AxisError: axis 3 is out of bounds for array of dimension 3" problem 现在有点怀疑会不是是某个依赖包版本的问题,能请您分享下conda list 的结果看下各个包的版本吗? 谢谢

gdineshk6174 commented 4 years ago

@gaoyu1983 yes , its a numpy version problem . i've updated the numpy from the recommended version to the next version ,ie ,from numpy == 1.14 to "numpy == 1.15"

gaoyu1983 commented 4 years ago

Thank you, it really works.

@gaoyu1983 yes , its a numpy version problem . i've updated the numpy from the recommended version to the next version ,ie ,from numpy == 1.14 to "numpy == 1.15"

xiaoyangnihao commented 3 years ago

TTS交流群,VX:WorldSeal,欢迎进群讨论相关问题~

ben-8878 commented 3 years ago

wavnet 我跑了1000k步,啥结果也没有?有大佬指点一下吗,没有用预训练模型,训练数据约80小时。 配置参数:

{
  "name": "wavenet_vocoder",
  "input_type": "raw",
  "quantize_channels": 65536,
  "preprocess": "preemphasis",
  "postprocess": "inv_preemphasis",
  "global_gain_scale": 0.55,
  "sample_rate": 22050,
  "silence_threshold": 2,
  "num_mels": 80,
  "fmin": 125,
  "fmax": 7600,
  "fft_size": 1024,
  "hop_size": 256,
  "frame_shift_ms": null,
  "win_length": 1024,
  "win_length_ms": -1.0,
  "window": "hann",
  "highpass_cutoff": 70.0,
  "output_distribution": "Normal",
  "log_scale_min": -16.0,
  "out_channels": 2,
  "layers": 24,
  "stacks": 4,
  "residual_channels": 128,
  "gate_channels": 256,
  "skip_out_channels": 128,
  "dropout": 0.0,
  "kernel_size": 3,
  "cin_channels": 80,
  "cin_pad": 2,
  "upsample_conditional_features": true,
  "upsample_net": "ConvInUpsampleNetwork",
  "upsample_params": {
    "upsample_scales": [
      4,
      4,
      4,
      4
    ]
  },
  "gin_channels": -1,
  "n_speakers": 7,
  "pin_memory": true,
  "num_workers": 2,
  "batch_size": 8,
  "optimizer": "Adam",
  "optimizer_params": {
    "lr": 0.001,
    "eps": 1e-08,
    "weight_decay": 0.0
  },
  "lr_schedule": "step_learning_rate_decay",
  "lr_schedule_kwargs": {
    "anneal_rate": 0.5,
    "anneal_interval": 200000
  },
  "max_train_steps": 1000000,
  "nepochs": 2000,
  "clip_thresh": -1,
  "max_time_sec": null,
  "max_time_steps": 10240,
  "exponential_moving_average": true,
  "ema_decay": 0.9999,
  "checkpoint_interval": 100000,
  "train_eval_interval": 100000,
  "test_eval_epoch_interval": 50,
  "save_optimizer_state": true
}