Nan training loss for CRNN

ViSonic-NN / muscribe

MIT License

0 stars 0 forks source link

Nan training loss for CRNN #6

Closed cpyang123 closed 1 year ago

cpyang123 commented 1 year ago

During the training of the CRNN for the beats predictions, we the training loss gradually decreased from a very large number such as 300 to nan. 43ebd5ba891062a2c8c69538d680a99

The output predictions are also dubious:

This might be due to some normalization problem, or something in the loss function. We'll need to experiment with both to find what the issue might be.

cpyang123 commented 1 year ago

In light of this problem, we'll try some of the following options:

Gradient Clipping
Learning Rate Warm-up
Increasing Batch-Size
Investigating other adjustments to the loss functions

cpyang123 commented 1 year ago

After a tedious amount of investigation, it was found that the NaNs were produced by zero weights and biases of the embedding layer. Added gradient clipping and decreased the weight decay of the optimizer to 0.00001, the problem was mitigated, and the model was able to run for 100 epochs and converge:

cpyang123 commented 1 year ago

For the record, multiple other changes were made to the models:

increased model rnn layer count from 3 to 4
seq_len_sec increased to 150, and decreased lr to 1e-5
d_model and d_hid were changed back to 256 and 128 respectively
numerous assertions were added