Building a 3D Reccurent Convolutional Network

prateepcs4 commented 8 years ago

hi @nicholas-leonard I am trying to build a model for gesture recognition which has two parts. The first one is a 3D Conv Net and the last part is a recurrent layer having LSTM cells. The model I am using is as follows -

net = nn.Sequential()
net:add(nn.SplitTable(1))
net:add(nn.Sequencer(3D CNN)) - (8 Conv layers) - Last layer outputs 512x1x4x5
lstm= nn.LSTM(512*1*4*5, 200)
net:add(nn.Sequencer(lstm))
net:add(nn.SelectTable(-1))
net:add(nn.Linear(200,50)):add(nn.LogSoftMax()) -(For classification into any one of the gesture classes)

criterion = nn.ClassNLLCriterion()
seqCriterion = nn.SequencerCriterion(criterion)

I am giving input to the model as N x 1 x 3 x 16 x 128 x 171 where 'N' signifies the sequence length. 3 x 16 x 128 x 171 signifies 3 channels, 16 frames, 128 as height and 171 as the width of a video clip.

Now when I am giving the target as labels (Tensor of size N as the sequence length is N) I am getting error as "expecting target Tensor since input is a Tensor". I am new to LSTMs in Torch. Can you specify how should I give the input and output in this particular case?

nicholas-leonard commented 8 years ago

So you are training a model to learn a sequence of videos? I am confused as normally, your video would be a sequence of frames (i.e. N = 16 frames) :)

If your input is a tensor, then so should your target. Is your target a table?

prateepcs4 commented 8 years ago

@nicholas-leonard Actually, I have a dataset of many videos and each video consists of parts where different kind of gestures are performed. I have separated the videos into fixed size lengths of 16 frames and call them 'clips'. Now I want to train a model very similar to the model as follows - selection_001

As the Conv Net here deals volumetrically hence I pass a sequence of clips into the model. And the model is a "Many-to-one" network. What I want from the model is to predict the performed gesture in each clip. Now as far as I can understand about thenn.SequencerCriterion() module, it applies the underlying nn.ClassNLLCriterion() to the corresponding 'clip' and the output (Which in this case is a gesture label). So apparently my target should be a sequence of gesture labels. I'm not sure if I am formulating the input and output properly. Thanks in advance.

nicholas-leonard commented 8 years ago

Nice! Ok now I see what it is you want to do. I think your problem is that the criterion expects both the model output and target to be a table. The input is already a table, but I am guessing your target is still a tensor. You can split it into a table along the seqlen dimension like this:

local split = nn.SplitTable(1)
target = split:forward(target)

Also, is your input of fixed length? (if so, there is a possible optimization)

prateepcs4 commented 8 years ago

I already tried the step you told before. I included two print statements in the nn.SequencerCriterion() to check what kind of input and target it was getting before running the assertion check. The output was something like this -

Input --> 20 [torch.LongStorage of size 1]

Output --> { 1 : DoubleTensor - size: 1 2 : DoubleTensor - size: 1 3 : DoubleTensor - size: 1 4 : DoubleTensor - size: 1 5 : DoubleTensor - size: 1 6 : DoubleTensor - size: 1 7 : DoubleTensor - size: 1 } Output Type --> table

Here size 7 means I am passing a sequence of 7 'clips'. And then the same error expecting target Tensor since input is a Tensor was shown.

And mostly the number of 'clips' in a sequence will not be constant. But I can try to do some workarounds to make the sequence length constant.

nicholas-leonard commented 8 years ago

@prateepcs4 Oh. Just noticed that you are using SelectTable(-1) to get the last element of the input sequence. So basically, doing this concords with your comment that you are building a many-to-one model, i.e. many videos map to one target. But I don't think this is what you want to do as your targets seem to be a sequence as well right? In which case replace the output layer with:

-- remove this : net:add(nn.SelectTable(-1))
-- decorate with sequencer so Linear is applied to each time-step
net:add(nn.Sequencer(nn.Linear(200,50)):add(nn.LogSoftMax()))

The above is many-to-many. If however you really want to do many-to-one then keep the output layer as is and instead replace the criterion with:

criterion = nn.ClassNLLCriterion()
-- no need for this as there is only one target per sequence: seqCriterion = nn.SequencerCriterion(criterion)

and don't split the target.

prateepcs4 commented 8 years ago

@nicholas-leonard Thanks for the suggestion. Yeah it was mistake from my side to include a many to one model here. I corrected it and right now sending my output labels as a table of 7 tensors each having length 1 (basically they hold the label for each clip in the sequence). But then also when I am doing a forward pass through the criterion the following error I am getting - Output table before calling criterion:forward() { 1 : DoubleTensor - size: 1 2 : DoubleTensor - size: 1 3 : DoubleTensor - size: 1 4 : DoubleTensor - size: 1 5 : DoubleTensor - size: 1 6 : DoubleTensor - size: 1 7 : DoubleTensor - size: 1 } Input --> { 1 : DoubleTensor - size: 20 2 : DoubleTensor - size: 20 3 : DoubleTensor - size: 20 4 : DoubleTensor - size: 20 5 : DoubleTensor - size: 20 6 : DoubleTensor - size: 20 7 : DoubleTensor - size: 20 } Target inside SequencerCriterion() --> nil Target Type --> table

And the error is as follows - torch/install/share/lua/5.1/nn/Linear.lua:84: invalid arguments: DoubleTensor number number DoubleTensor table expected arguments: DoubleTensor~1D [DoubleTensor~1D] [double] DoubleTensor~2D DoubleTensor~1D | DoubleTensor~1D double [DoubleTensor~1D] double DoubleTensor~2D DoubleTensor~1D

I think my input format is correct but I am doing something wrong in my output.

nicholas-leonard commented 8 years ago

Apparently the Linear gradOutput is a table : https://github.com/torch/nn/blob/master/Linear.lua#L84 . Can you post your new model and complete stack trace?

prateepcs4 commented 8 years ago

Please check the whole stack trace. I am passing my input as (7 x 3 x 16 x 128 x 171) size 5D tensor.

stacktrace.txt

nicholas-leonard commented 8 years ago

Hmm. Difficult to identify the cause. The forward part when through fine. The issue is with the backward. Somehow the gradInput = LSTM:backward() is generating a table. To confirm this, add nn.PrintSize() between your 3D conv and LSTM and call forward. Sorry I can't be of much more help, but it is much easier to debug these things from the command-line than from github :)

This has nothing to do with your issue, but instead of using nn.Sequencer(nn.LSTM), you should use nn.SeqLSTM() as it is much faster.

Another thing that might cause issues is not using mini-batches. I think you are using online-mode (one sample at a time), whereas it is more common to combine multiple samples into a batch. I think your input should be something like (7 x batchsize x 3 x 16 x 128 x 171). For a single input, batchsize = 1.

prateepcs4 commented 8 years ago

@nicholas-leonard Thanks for the valuable inputs. I have successfully debugged the code and it seems there was a problem with my Torch nn rock which got fixed by reinstalling it (weird). Now coming to using nn.SeqLSTM(), I do not need to decorate it with a sequencer module. Am I right on this? And are all these modules compatible with CuDNN (R5)? Because it seems impossible to get speed up using CuNN only. Previously while training the 3D conv net alone I was using DataParallelTable for spliting the workload on multiple GPUs. Is the same thing compatible with SeqLSTM? Thanks for all the help again. :+1: And one more thing - if I use the batch size as >1 then should I format the target as a table of seqlen containing tensors of size batchsize? This is giving me /lua/5.1/cudnn/Pointwise.lua:13: Non-contiguous inputs not supported yet error. I think I have to make the views contiguous somehow.

nicholas-leonard commented 8 years ago

Now coming to using nn.SeqLSTM(), I do not need to decorate it with a sequencer module. Am I right on this?

Yes

And are all these modules compatible with CuDNN (R5)?

Yes.

Previously while training the 3D conv net alone I was using DataParallelTable for spliting the workload on multiple GPUs. Is the same thing compatible with SeqLSTM?

I haven't tried it.

And one more thing - if I use the batch size as >1 then should I format the target as a table of seqlen containing tensors of size batchsize?

Yes, seqlen x batchsize

This is giving me /lua/5.1/cudnn/Pointwise.lua:13: Non-contiguous inputs not supported yet error. I think I have to make the views contiguous somehow.

You can use nn.Copy(nil, nil, true) to make a tensor contiguous.

prateepcs4 commented 8 years ago

I tried the nn.Copy(nil, nil, true) trick and also tried replacing them with nn.Contiguous() modules but none seem to work on batch mode with nn.FastLSTM(). But when I tried with CuNN only it seemed to work although taking a long time and memory. And for using nn.SeqLSTM() is the input format same? Because I am getting a table of length 7 each having 1000 size tensor coming from the 3D CNN part. But although this works fine with nn.FastLSTM() it seems not to work with nn.SeqLSTM(). Because nn.SeqLSTM() apparently wants its input as tensor, not table of tensors. I am not wrapping the nn.SeqLSTM() module with sequencer.

UPDATE : Fixed it by omitting the nn.SplitTable(1) at the very beginning of the net. It seems to work now with nn.SeqLSTM() and CuDNN in batch mode. Previously the 3D CNN part was giving the output as a table of tensors which was causing the main problem. Omitting that gives the output in tensors and hence the error for contiguous memory does not seem to come.

nicholas-leonard commented 8 years ago

@prateepcs4 nn.Copy only works on tensors. So you might need to do something like nn.Sequencer(nn.Copy(...)) to make it work with tables.

For SeqLSTM the input is not a table of tensors, but a tensor of size seqlen x batchsize x inputsize. So you will need to remove the nn.SplitTable before your 3D sequencer. Don't worry, Sequencer works with either tables or tensors, so it will still work.

Ah damn, just saw that you answered this your self :)

Element-Research / rnn

Building a 3D Reccurent Convolutional Network #299