Open Clouxie opened 5 years ago
Generally just stopping at some point and swapping out the data and continuing the training works quite well from my experience.
Yep , but as result - at "eval / save step" I've got error ( Division by 0 exception )
Interesting, I didn't try with the latest version (and also not for WaveNet), so perhaps something there.
Could you please update me your version ? And so on, I've set eval and checpoint step to 100. And In hparams Start_decay to my actually trained model steps. Is it okey ?
I'm not doing something special. It worked for me in this repo (https://github.com/m-toman/tacorn/tree/fatchord_model - please note the branch, I'm working on the master branch). So it's more or less the default hparams: https://github.com/m-toman/Tacotron-2/blob/master/hparams.py
But I only adapted the tacotron part, not Wavenet (although I adapted a speaker using r9y9s Wavenet and that worked as well)
Division by 0 bug is because of having 0 batches for your eval data. I assume your fine tuning samples are very few, resulting in 5% being rounded down to 0 batches. Supposing you use batch_size=32 your overall finetuning samples are around 600 samples?
To overcome that, set "test_size" to None and "test_batches=10" for example or whatever number of batches you want to use for validation. That should do.
Let me know if the issue persists :)
On Thu, 29 Nov 2018, 09:36 Clouxie <notifications@github.com wrote:
Im still getting divided by 0 exception, This is the link to my taco files and fine tuning training corpus - It have some silence at the end and beginning. Could u please check if you can train it and eval without any exception ? https://www.dropbox.com/sh/rpazj5ll8ahasr7/AAD2M25jsPTsbeViZdF_UBaba?dl=0
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/Rayhane-mamah/Tacotron-2/issues/279#issuecomment-442749520, or mute the thread https://github.com/notifications/unsubscribe-auth/AhFSwE045ydVosz-jm25BVoBqDbvYruuks5uz5x3gaJpZM4Y3SPi .
I have around 3 hours of new data...
Okey, now it's saving and doing eval fine. We will se if the tune goes well..
@m-toman @Rayhane-mamah Fine tune by swapping out the data seem to got a voice that some difference between the fine-tuned data. How to solve this problem if I have not enough data.
In my opinion you need at least 15-20 minutes od data. More of data isn't even needed, Im doing some experiments on my setup and I'll let you know which is best. I think that too much amount of data or too long fine tuning can destroy your language information, so im training for max 1-2k steps.
From my experiments, too much steps will result in an overfitting problem (knowledge loss) , but too few steps will not get a similar sounds. Hope you can find a best way of fine tuning. BTW, maybe https://google.github.io/tacotron/publications/speaker_adaptation/ is a better solution
Hey @Rayhane-mamah and @begeekmyfriend! Have you tried to fine tune pretrained model on different voice? How much data did you use for it? How much steps did you train a pretrained model? And what learning rate decay did you use for it?
I pretrained Tacotron on 25h data for 120k steps and then tried to fine tune on 2.5 hours. With constant lr = 1e-5. After ~30k steps my model starts overfit and I stop the training. But quality of the new voice is not good enough. Some words and endings are skipped. Large pauses between words but there are no problems with alignment:
You may add guided attention loss into this model without any change to your attention model.
def initialize()
#Grab alignments from the final decoder state
self.alignments = final_decoder_state.alignment_history.stack()
alignments = tf.transpose(self.alignments, [1, 2, 0])
...
def add_loss():
N = 400
T = 1000
A = tf.pad(self.alignments, [(0, 0), (0, N), (0, T)], mode="CONSTANT", constant_values=-1.)[:, :N, :T]
attention_masks = tf.to_float(tf.not_equal(A, -1))
gts = tf.convert_to_tensor(guided_attention(N, T))
attention_loss = tf.reduce_sum(tf.abs(A * gts) * attention_masks)
mask_sum = tf.reduce_sum(attention_masks)
attention_loss /= .mask_sum
I will try. Thank you!
Hey @Rayhane-mamah and @begeekmyfriend! Have you tried to fine tune pretrained model on different voice? How much data did you use for it? How much steps did you train a pretrained model? And what learning rate decay did you use for it?
I pretrained Tacotron on 25h data for 120k steps and then tried to fine tune on 2.5 hours. With constant lr = 1e-5. After ~30k steps my model starts overfit and I stop the training. But quality of the new voice is not good enough. Some words and endings are skipped. Large pauses between words but there are no problems with alignment:
hi, did you freeze encoder when you finetune on new voice dataset? and do you have good quality on new voice now?
Hi there, im looking for some answers about, how to make some king of fine tuning like this : https://github.com/Kyubyong/speaker_adapted_tts In Rayhane tacotron solution. Does anyone know's the answers ? Is it possible ?