I have two neural networks- one in the tutorials directory, the second one I made by removing some layers from the one in recipes/librispeech/configs/conv_glu which looks like this:
V -1 1 NFEAT 0
WN 3 C NFEAT 400 13 1 170
GLU 2
DO 0.2
WN 3 C 200 440 14 1 0
GLU 2
DO 0.214
WN 3 C 220 484 15 1 0
GLU 2
DO 0.22898
WN 3 C 242 532 16 1 0
GLU 2
DO 0.2450086
WN 3 C 266 584 17 1 0
GLU 2
DO 0.262159202
WN 3 C 292 642 18 1 0
GLU 2
DO 0.28051034614
WN 3 C 321 706 19 1 0
GLU 2
DO 0.30014607037
WN 3 C 353 776 20 1 0
GLU 2
DO 0.321156295296
WN 3 C 388 852 21 1 0
GLU 2
DO 0.343637235966
RO 2 0 3 1
WN 0 L 426 852
GLU 0
DO 0.343637235966
WN 0 L 426 NLABEL
I have set the same flagsfile for both of them, having the following config:
Even in the subsequent epochs, the loss was not reducing any further.
For the network in tutorial, all the losses and error rates reduced to zero in the 730th epoch:
I have two neural networks- one in the tutorials directory, the second one I made by removing some layers from the one in
recipes/librispeech/configs/conv_glu
which looks like this:I have set the same flagsfile for both of them, having the following config:
I had to comment the momentum flag, since keeping it to any non zero value was returning an out of mem error.
For comparing these models, I started training on a single audio file.
For the neural network in recipes, I got the following output after epoch 1176:
Even in the subsequent epochs, the loss was not reducing any further. For the network in tutorial, all the losses and error rates reduced to zero in the 730th epoch:
Clearly, the differrence in throughput and error rates is quite high.
As I'm training on a single GPU, I later reduced my lr and lrcrit by 10 times.
For lr=0.1 and lrcrit=0.001, the bigger network gave a better loss(23) as compared to the smaller one(31) in 500 epochs.
But isn't 500 epochs too much to converge on a single audio file?
Any other suggestions which might help in converging the model to zero loss in less time?
I really want to use the bigger network.