have you tryed training with the full dataset?

Kyubyong / tacotron

A TensorFlow Implementation of Tacotron: A Fully End-to-End Text-To-Speech Synthesis Model

Apache License 2.0

1.83k stars 436 forks source link

have you tryed training with the full dataset? #63

Open kyoguan opened 7 years ago

kyoguan commented 7 years ago

this result is training on single gpu.

this result is training on 8 gpu. (BTW: you need to set BN = None, or you would got a strange result , because the batch normal problem on multi gpu. )

it seems the adam optimizer problem.

AzamRabiee commented 7 years ago

mean_loss

AzamRabiee commented 7 years ago

My result on full dataset. in epoch#40k, I have better wav signal, as well.

candlewill commented 7 years ago

@AzamRabiee How is your synthesized waves like at epoch 40? Could you share some of them?

AzamRabiee commented 7 years ago

model_epoch_40_gs_40000_1.wav.zip Text is "abraham said i will swear" this is the best synthesized wav

kyoguan commented 7 years ago

I think you still using the sanity_check = True, this is only a very small dataset, I got the same result. can you try the sanity_check = False ?

zuoxiang95 commented 7 years ago

In the code, if the sanity_check = True ,you use the data is only a single mini-batch, this batch repeats 1000 times: texts, sound_files = texts[:hp.batch_size]1000, sound_files[:hp.batch_size]1000

sniperwrb commented 7 years ago

I think you still using the sanity_check = True, this is only a very small dataset, I got the same result. can you try the sanity_check = False ?

I don't think it works. I tried sanity_check=False, and used 40 epochs as well, because if I use 10000 it would cost me the whole summer :( The result is a total mess. It is just repeating some certain phoneme that I cannot understand.

tmulc18 commented 7 years ago

I did a non sanity check training (asynchronously on 384 cores) with lr=.00015, norm=ins, loss=l1, min_len=10, max_len=100, and r=5. Additionally, I used Luong Attention, gradient clipping by norm of 5, and binning. Results are bad, but I ran out of cloud credits before I thought I was done training.

After reading some of the comments here, I regret using any sort of normalization. I will see if I can try a similar experiment again on a single gpu later this week, but it will be sure to take more time.

samples%2Fmodel.ckpt-231691_25.wav.zip

sniperwrb commented 7 years ago

@tmulc18 I think normalization is somehow important. When I try to eval.py with norm=None, the result is a mess (even for the pretrained model from the tacotron author) When I try to eval.py with some normalizations, but the training did not have any, it would raise an error...

jpdz commented 7 years ago

Have you got good results? I trained the model based on the whole dataset. However, the loss is still, more than 1. And the results are not so good. Thanks a lot