Open MorganCZY opened 4 years ago
Please specify reference audio's path in the 'tacotron_style_reference_audio' of hparams.py, then synthesizing. Feel free to raise more questions.
Yes,I have specified the reference audio path in hparams.py
在 2020年9月17日,18:59,cnlinxi notifications@github.com 写道:
Please specify reference audio's path in the 'tacotron_style_reference_audio' of hparams.py, then synthesizing. Feel free to raise more questions.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.
In hparams.py:
tacotron_style_alignment=None,
you can manually specify style token alignment weights instead of getting them from reference audio.
Do you mean this?
Here are my hparams settings. I specify a reference audio path, which will be sent to gst module(namely the reference encoder). For a trained model, the weights of encoder, decoder, attention and gst are all fixed. So, basically I can't understand why I will get different wavs with the same text and the same reference audio as input, considering that there seems no random operations in the code.
@MorganCZY In the original Tacotron-2, dropout was turned on during inference, and so is this one. So, every time you generate wav, the audio will be different.
我也想问下这个问题, 那我如果想针对同一个tacotron_style_reference_audio 使得每次出来的音频都是相同的,要如何操作呢
@CathyW77 在生成时,关闭prenet中的dropout应该就可以了。在tacotron/models/modules.py中Prenet类中,有:
x = tf.layers.dropout(dense, rate=self.drop_rate, training=True, name='dropout_{}'.format(i + 1) + self.scope)
对tf.layers.dropout()
中的参数training
在生成时,置为False。
It's indeed the only random opration during synthesizing process after searching the whole repo. But both of setting "training=False" or "training=self.is_training" in prenet can not then generate correct wavs.
@MorganCZY What does correct wav mean? Can't generate audio?
samples.zip true.wav--->"training=True"; self_is_training.wav--->"training=self.is_training"; false.wav--->"training=False"
@MorganCZY This completely failed. Can you show the sample of your training corpus and the alignment during training?
I trained this model with thch30s. alignment.zip Here are the latest three alignment graphs, corresponding to 6w, 6.5w, 7w steps.
@cnlinxi 我设置为false之后就出来的音频都是有问题了,发不出任何一个字,改为true又能正常生成
@CathyW77 欸,为啥,这个好奇怪。不过我确实没有尝试过关闭这个dropout。
I trained this model with thch30s. alignment.zip Here are the latest three alignment graphs, corresponding to 6w, 6.5w, 7w steps.
@MorganCZY
This is a bit strange. I'm sorry that I do not know what happened. The alignment is good, you can check your synthesis.
When doing tests, I found each time I ran the synthesize.py(with the same text and reference audio), I got different results(namely different syntheized wavs). After looking up the code, I didn't find there are random operations when synthesizing. Could you give me some explanations?