Closed NightFury13 closed 7 years ago
Hey @NightFury13 thanks for this, that's definitely a mistake on my side, it should say hiddenDim! It just reshapes the input to sum the activations rather than have them separate from the bi-directional RNNs :)
Just opened a branch here. Using this branch:
th Train.lua -LSTM -hiddenSize 600 #Just be mindful of the number of parameters
@SeanNaren Thanks a lot for this! I am facing network issues on my end. Will update with my findings as soon as I am back online!
NOTE : This is a continuation thread for any future readers who stumble upon similar issues. Before you start off here, do give the conversation on this issue a read.
I am trying to use the deepspeech model to train for scenetext tasks on images. So far, I have been able to convert my data to the LMDB format expected by the codes and run the training scripts, but the error acts really goofy and keeps skipping between inf/nan/+ve/-ve values. Initial trials on this included limiting the value of the
MaxNorm
of gradients to stop the exploding gradients but that didn't help. The next attempt was to replace the original vanilla RNNs of DeepSpeech2 with LSTM layers in hopes of limiting the gradient-explosion. To do so, one needs to change the RNNModule class in DeepSpeech.lua as pointed out by @SeanNaren below.Change:
to something like:
@SeanNaren : can you help me out understanding what does the outputDim signify in the changed code? We have the output-dims different from the hidden-dims?