SeanNaren / deepspeech.torch

Speech Recognition using DeepSpeech2 network and the CTC activation function.
MIT License
259 stars 73 forks source link

Using a custom dataset with deepspeech codes #58

Closed NightFury13 closed 7 years ago

NightFury13 commented 8 years ago

NOTE : This is a continuation thread for any future readers who stumble upon similar issues. Before you start off here, do give the conversation on this issue a read.

I am trying to use the deepspeech model to train for scenetext tasks on images. So far, I have been able to convert my data to the LMDB format expected by the codes and run the training scripts, but the error acts really goofy and keeps skipping between inf/nan/+ve/-ve values. Initial trials on this included limiting the value of the MaxNorm of gradients to stop the exploding gradients but that didn't help. The next attempt was to replace the original vanilla RNNs of DeepSpeech2 with LSTM layers in hopes of limiting the gradient-explosion. To do so, one needs to change the RNNModule class in DeepSpeech.lua as pointed out by @SeanNaren below.

Change:

local function RNNModule(inputDim, hiddenDim, opt)
    if opt.nGPU > 0 then
        require 'BatchBRNNReLU'
        return cudnn.BatchBRNNReLU(inputDim, hiddenDim)
    else
        require 'rnn'
        return nn.SeqBRNN(inputDim, hiddenDim)
    end
end

to something like:

local function RNNModule(inputDim, hiddenDim, opt)
        require 'cudnn'
        local rnn = nn.Sequential()
        rnn:add(cudnn.BLSTM(inputDim, hiddenDim, 1)
        rnn:add(nn.View(-1, 2, outputDim):setNumInputDims(2)) -- have to sum activations
        rnn:add(nn.Sum(3))
        return rnn
end

@SeanNaren : can you help me out understanding what does the outputDim signify in the changed code? We have the output-dims different from the hidden-dims?

SeanNaren commented 8 years ago

Hey @NightFury13 thanks for this, that's definitely a mistake on my side, it should say hiddenDim! It just reshapes the input to sum the activations rather than have them separate from the bi-directional RNNs :)

SeanNaren commented 8 years ago

Just opened a branch here. Using this branch:

th Train.lua -LSTM -hiddenSize 600 #Just be mindful of the number of parameters

NightFury13 commented 8 years ago

@SeanNaren Thanks a lot for this! I am facing network issues on my end. Will update with my findings as soon as I am back online!

SeanNaren commented 7 years ago

Going to close this since new information has been added about custom datasets here.