Kyubyong / dc_tts

A TensorFlow Implementation of DC-TTS: yet another text-to-speech model
Apache License 2.0
1.16k stars 370 forks source link

Keras rewrite: Attention loss goes up after 5 epochs? #67

Open dimasikson opened 4 years ago

dimasikson commented 4 years ago

Hi, I attempted to re-write this repo in Keras to migrate it to TF 2.0. In short, I need some help in terms of the training process. My attention loss goes up over time, which reflects on the quality of the Mel output. In the output, the attention line is scattered.

Here is the repo: https://github.com/dimasikson/dc_tts_keras

In the Text2Mel model, my attention loss goes up after 4-7 epochs, depending on the hyperparams.

Now the batch size in my model is 8 due to my GPU not being able to fit 32 in one go, but I did try the original model on B=4 and it was totally fine after 20 epochs. I doubt this is to do with Batch size.

Here was the attention loss (moving average) with 'vanilla' hyperparams, or exactly as found in the original repo (except for the Batch size as mentioned earlier). 1638 steps per epoch, 15 epochs, 2500 step 2-sided moving average chart.

image

Here is after I randomized the batch order between epochs AND increased the LR decay in the 'utils' file. 8 epochs, same moving average.

image

The increased decay makes the effect appear later in the training, but it still fairly deterministically goes up.

In the grand scheme of things, the overall loss goes down just fine, but this attention loss kind of screws up the output. Here is the total loss after 8 epochs (2nd model):

image

What results is an output like this (2nd model, 8 epochs). Below is the attention plot in the synthesis stage. Purposefully turned off mono attention for the sake of the visual.

image

And below are epochs 3,4,5 from the 1st model, which is roughly where it screws up.

Epoch 3: image

Epoch 4: image

Epoch 5: image

What I would like to understand:

Thanks in advance!