ViSonic-NN / muscribe

MIT License
0 stars 0 forks source link

Nan training loss for CRNN #6

Closed cpyang123 closed 1 year ago

cpyang123 commented 1 year ago

During the training of the CRNN for the beats predictions, we the training loss gradually decreased from a very large number such as 300 to nan. 43ebd5ba891062a2c8c69538d680a99

The output predictions are also dubious: image

This might be due to some normalization problem, or something in the loss function. We'll need to experiment with both to find what the issue might be.

cpyang123 commented 1 year ago

In light of this problem, we'll try some of the following options:

cpyang123 commented 1 year ago

After a tedious amount of investigation, it was found that the NaNs were produced by zero weights and biases of the embedding layer. Added gradient clipping and decreased the weight decay of the optimizer to 0.00001, the problem was mitigated, and the model was able to run for 100 epochs and converge: image

cpyang123 commented 1 year ago

For the record, multiple other changes were made to the models: