Kyubyong / expressive_tacotron

Tensorflow Implementation of Expressive Tacotron
197 stars 33 forks source link

How to find out when training went wrong? #6

Open marymirzaei opened 6 years ago

marymirzaei commented 6 years ago

Thank you very much for your contribution. I have trained the model on LJ Speech for 835k. However, the results are not as good as the samples you provided for 420k. Maybe some problem with my training? Below you can find the attention plot and the sample audio at 835k. What kind of attention plot signals a good checkpoint for the synthesizer? alignment_835k And the progress was like this: problem

The samples synthesized from this checkpoint can be found here: https://www.dropbox.com/sh/n5ld72rn9otxl7a/AAACyplZMtxiYtuUgvWN8OGaa?dl=0

Also, the trained model (checkpoint), is uploaded here: https://www.dropbox.com/sh/ks91bdputl5ujo7/AABRIqpviRDBgWuFIJn1yuhba?dl=0

Also, I was wondering if you have any plans to release your trained model.

Another thing is the tf.save keeps the last 5 checkpoints by default, and the wrapper used here (i.e. tf.train.Supervisor) does not easily allow changing max_to_keep property of the saver.

PS. The hyperparameters are kept as default.

    # signal processing
    sr = 22050 # Sample rate.
    n_fft = 2048 # fft points (samples)
    frame_shift = 0.0125 # seconds
    frame_length = 0.05 # seconds
    hop_length = int(sr*frame_shift) # samples.
    win_length = int(sr*frame_length) # samples.
    n_mels = 80 # Number of Mel banks to generate
    power = 1.2 # Exponent for amplifying the predicted magnitude
    n_iter = 50 # Number of inversion iterations
    preemphasis = .97 # or None
    max_db = 100
    ref_db = 20

    # model
    embed_size = 256 # alias = E
    encoder_num_banks = 16
    decoder_num_banks = 8
    num_highwaynet_blocks = 4
    r = 5 # Reduction factor.
    dropout_rate = .5

    # training scheme
    lr = 0.001 # Initial learning rate.
    logdir = "logdir"
    sampledir = 'samples'
    batch_size = 32
    num_iterations = 1000000