deepgram / kur

Descriptive Deep Learning
Apache License 2.0
814 stars 107 forks source link

Weights for speech recognition are not restored when again starting the training as loss value climbs back to 1st epoch value i.e 316 instead of starting from reduced loss #89

Open nilesh02 opened 6 years ago

nilesh02 commented 6 years ago

i am training a model of speech recognition(speech.yml) but after the training was interrupted due to some reason i restart the training. The training continues from the next epoch value but the loss comes out to be the same as 1st epoch loss i.e. 316 and i have trained the model till loss 37. Why the loss value is again 316 but not continues from 37?

I have check the weights folder but it shows 0KB size of file for each file but size on disk is nearly 75mb. screenshot 73 screenshot 74

Please suggest me what to do to start the training again from the same loss or to restore the weights files?

scottstephenson commented 6 years ago

Can you upload your kurfile?

nilesh02 commented 6 years ago

Text form of file speech.yml speech.txt the code is same as GitHub repository of kur. (https://github.com/deepgram/kur/blob/master/examples/speech.yml)

scottstephenson commented 6 years ago

Without seeing your loss plot it's hard to tell (you can generate one from your log directory, check the tutorial on kur.deepgram.com for that). I am betting you are running into confusion resulting from sortagrad. Sortagrad is a curriculum learning method that is enabled in this kurfile which will start training on short audio files at first and ramp up throughout the epoch til the longest audio files at the end of the epoch (sorted in order). Loss is a function of how many errors you make and with longer audio files you tend to make more errors, so the loss tends to go up with longer audio files. This means that your first epoch will start out with low loss and ramp up over time. It may continue increasing until the very end of your first epoch, or (if you have enough data) might roll over and start declining until it hits the end of the epoch. Your second epoch will then start training with randomly shuffled audio files (as in typical in normal training).

But, if you stop and restart, sortagrad will run for the first epoch coming back up, no matter what. Even if you already completed a full epoch beforehand (or more). To stop sortagrad from starting, just comment out the line in the kurfile with sortagrad in it.

I'm still not 100% sure that's where your problem lies but let me know if this helps (and even better, upload a loss plot!).

nilesh02 commented 6 years ago

final_graph

Training loss reached 20 as you can see in the graph but its not restoring the weights after restarting as the loss value is again 316.

Last two epochs did not have sortagrad Thank you for replying.