Element-Research / rnn

Recurrent Neural Network library for Torch7's nn
BSD 3-Clause "New" or "Revised" License
939 stars 313 forks source link

Multiple object recognition with visual attention #134

Open pefi9 opened 8 years ago

pefi9 commented 8 years ago

Hi,

I am trying to use the recurrent attention model for multiple object ( http://arxiv.org/pdf/1412.7755v2.pdf ). Would you have suggestions how to do it?

nicholas-leonard commented 8 years ago

@pefi9 Use recurrent-visual-attention.lua script as a starting point. Build a dataset (without dp) where input is image output is sequence of targets (with location?). You should probably still be able to use most of the modules used in the original script, but you will need to assemble them differently and create a different one for the first time-step. If you need help, fork this repo and make multiple-object-recognition.lua script. Create a branch and pull request it here with (work in progress) in title. Or you could create your own repository. In any case, making it open source, we can work on it together.

pefi9 commented 8 years ago

Thanks @nicholas-leonard . I created github repo with the code available - https://github.com/pefi9/mo_va (This version has padding between the digits and the glimpse size is small by intention, so I can validate whether the network can learn and move from the left part to the right part.)

I tried two approaches:

1) Comment line 38, 39 in RecurrentAttention.lua so that the attention module not forget after first digit. However I was not able to make it running as number of steps for rnn and attention modules inside of the recurrent attention did not match. Even though I set rho = 5 for rnn, after analyzing second digit the # of steps of the rnn was 10 and step of the attention was 5.

2) Set rho = (# of glimpse for one digit) * (# of digits). So that the recurrent attention model remembers all the history of one image. For this solution I removed line 154 in recurrent-visual-attention.lua (nn.SelectTable(-1)) as I want to output more than just one table. To be specific I want to forward and backward propagate only the x-th (e.g. 5th) output of the recurrent attention module. In addition to that, according to the paper, I want to back-propagate only the digits where the previous digit was correctly classify. This matter should be handled on the lines 102 - 129 in 4_train.lua.

It seems to be learning, but the performance is not excellent. I'm sure I do have there some more mistakes. Is it possible to adjust it for a variable number of digits? I can't think of any solution at the moment.

nicholas-leonard commented 8 years ago

@pefi9 I don't think you should need to modify RecurrentAttention. Say you want to detect n objects per image, then formulate the problem as giving rho/n steps per object. So for 2 objects, I could assign a rho of 10 such that an object should be identified every 5 time-steps.

You should build a MultiObjectReward criterion for doing https://github.com/pefi9/mo_va/blob/multi_digit_development/4_train.lua#L102-L129 (of course, you will still need a loop over n objects to update the ConfusionMatrix). Why build a criterion? So you can unit test it. Also, the current implementation only allows one call to reinforce() per batch as a single reward is expected. Calling reinforce(reward) n times per batch (once per object) will only use the last reward.

So yeah, I think you would build a MultiObjectReward criterion and include some unit tests so that it behaves as expected.

Also, you should be able to use the original RecurrentAttention without modification as the output should have rho = (# of glimpse for one digit) * (# of digits) as you said. To select only the n (# of digits) outputs, use something like :

concat = nn.ConcatTable():add(nn.Select(n)):add(nn.Select(n*2))...:add(nn.Select(-1))
pefi9 commented 8 years ago

@nicholas-leonard , I had time to look at it today. I tried to handle the step-wise reward by implementing the https://github.com/Element-Research/rnn/blob/master/Sequencer.lua#L144-L146 , but as RecurrentAttention wrap the locator into Recursor, there is issue: "Sequencer.lua:37: expecting input table". So I created MOReinforce and MOReinforceNormal where the first one return reward for a specific step and the second one keeps track of the actual step. There is a MORewardCriterion as well which should replace VRClassReward but putting the gradInputs into correct form is ... perhaps it will be easier to not use ParallelCriterion at all and use only something like the MOReward. Or would you have some other idea how solve it (more elegant way)?

nicholas-leonard commented 8 years ago

@pefi9 Sorry had a bad cold these past days. So I think we should modify AbstractSequencer to accept tables of rewards (one per time-step).

nicholas-leonard commented 8 years ago

@pefi9 I have modified the AbstractRecurrent to handle tables of rewards : https://github.com/Element-Research/rnn/commit/417f8df6fb90f7ae58ad87deae12950789f8d346 . Basically, you shouldn't need MOReinforce and MOReinforceNormal anymore. Instead, make sure that your MORewardCriterion calls module:reinforce(rewards) where rewards is a table of the same length as its input. So it returns one reward per time-step.

pefi9 commented 8 years ago

@nicholas-leonard No worries, hope you are well now. I had couple errors in the code I'll update the github version tomorrow. It's works fine for single object however it takes a lot of time to train for multiple digits.

Modification which I have not tackled yet is to enable recognition of sequences with variable length. I`m not whether is it even possible to do with the current version of RecurrentAttention?

pefi9 commented 8 years ago

Thanks for the update.

nicholas-leonard commented 8 years ago

@pefi9 For variable length sequences, you could add a terminate class. When this class is predicted, regardless of position, it means that the model has found all instances. If your longest sequence has length n, then you should let your model detect n+1 objects. The +1 is so it can always learn to detect the terminate class/object at the end of the sequence.

pefi9 commented 8 years ago

@nicholas-leonard With the new AbstractRecurrent I've got this error:

...orch/install/share/lua/5.1/rnn/AbstractSequencer.lua:4: DEPRECATED 27 Oct 2015. Wrap your internal modules into a Recursor instead
stack traceback:
        ...petrfiala/torch/install/share/lua/5.1/trepl/init.lua:500: in function <...petrfiala/torch/install/share/lua/5.1/trepl/init.lua:493>
        [C]: in function 'error'
        ...orch/install/share/lua/5.1/rnn/AbstractSequencer.lua:4: in function 'getStepModule'
        ...orch/install/share/lua/5.1/rnn/AbstractRecurrent.lua:162: in function 'reinforce'
        ...etrfiala/torch/install/share/lua/5.1/dpnn/Module.lua:598: in function 'reinforce'
        MORewardCriterion_table.lua:111: in function 'updateGradInput'

I assume it's caused by the RecurrentAttention. By changing its parent to nn.Container I have got different error:

...rfiala/torch/install/share/lua/5.1/rnn/Sequencer.lua:145: Sequencer Error : step-wise rewards not yet supported
Would be sufficient to change 
https://github.com/Element-Research/rnn/blob/master/Sequencer.lua#L143-L148 
for 
function Sequencer:reinforce(reward)
    return parent.reinforce(self, reward)
end

?

nicholas-leonard commented 8 years ago

@pefi9 I just removed that check in latest commit. As for the first error, not sure how that is happening.

pefi9 commented 8 years ago

@nicholas-leonard I've got 2 findings:

1) When Sequencer is used (to wrap not recurrent module) it gives the error which I mentioned in the previous comment. The reason is https://github.com/Element-Research/rnn/blob/master/AbstractRecurrent.lua#L162 , it calls AbstractSequencer:getStepModule and that is deprecated. Which decorator shall I use for classifier and concat2 https://github.com/pefi9/mo_va/blob/multi_digit_development/2_model_VA.lua#L128-L132 ?

2) When called method model:reinforce(reward) goes through selection tables the reward is not filled with zero tables for the other indexes as in the case of updateGradInput. I'll adjust the backward method in MOCriterion accordingly or would you rather make changes in nn.SelectTable?

vyouman commented 8 years ago

@pefi9 Hi, I'm also going to implement the DRAM model to apply to some real-world images. So have you got the problems solved? Do you think is it possible to use ReinforceNormal, Reinforce and RecurrentAttention without any modifications and just write a new Criterion to get the time-step reward now? Thanks.

pefi9 commented 8 years ago

Hi @vyouman, yes it should be possible. However, we did not solved the first point in my previous comment. The work around I used is to change the parent class of RecurrentAttention from "nn.AbstractSequencer" to "nn.Container". I was able to train it only for 2 digits (objects) not more, so we decided to use just simple CNN with multiple classifiers on the output and the MOCriterion has stayed in development phase.

vyouman commented 8 years ago

@pefi9 Thanks for your patient reply. :p I wonder if you have any idea about how to solve the sequences of the variable length, to be clear, say the longest sequences in the dataset is D, and there are samples of different variable length in one batch, but the longest sequcence in a single batch may be shorter than D. Does it help to write a terminate class? Kind of confused about the solution to the sequences of variable length.

pefi9 commented 8 years ago

@vyouman, I had the same question (Nicholas' answer from Feb 12). It's not possible at the moment. You have to define the maximum number of objects (length of the sequence) and number of taken glimpses in advance (I did: https://github.com/pefi9/mo_va/blob/multi_digit_development/2_model_VA.lua#L122-L126 , where opt.digits is the max length and opt.steps is the # of taken glimpses per object, digit). It would be nice feature to have but I can't think of any easy extension of the current code which would enable it.

nicholas-leonard commented 8 years ago

You could add padding. Specifically, you add dummy classes at the end of the target sequence that mean "END OF SEQUENCE".

ssampang commented 8 years ago

@nicholas-leonard I've come across the same problem that @pefi9 faced with RecurrentAttention's error when getStepModule is called. Shall I change the parent class like they did as well?

Until now I was using a custom reinforce method for the Recursor module that essentially did the same thing, but I think it'd be better to delete my code and use what's built into this library.

vyouman commented 8 years ago

@nicholas-leonard Yeah, I've also encoutered the problem @pefi9 and @ssampang came across because of the deprecated getStepModule of the AbstractSequencer. Changing the parent class just doesn't work. I'm trying to implement the Deep Recurrent Attention model and my reward is a table.

/home/vyouman/torch/install/bin/luajit: ...an/torch/install/share/lua/5.1/rnn/AbstractSequencer.lua:4: DEPRECATED 27 Oct 2015. Wrap your internal modules into a Recursor instead
stack traceback:
    [C]: in function 'error'
    ...an/torch/install/share/lua/5.1/rnn/AbstractSequencer.lua:4: in function 'getStepModule'
    ...an/torch/install/share/lua/5.1/rnn/AbstractRecurrent.lua:177: in function 'reinforce'
    /home/vyouman/torch/install/share/lua/5.1/dpnn/Module.lua:586: in function 'reinforce'
    ...-linux.gtk.x86_64/workspace/DRAM/src/VRCaptionReward.lua:53: in function 'backward'
    ...product-linux.gtk.x86_64/workspace/DRAM/src/testRAEx.lua:171: in main chunk
    [C]: at 0x00406670
nicholas-leonard commented 8 years ago

@pefi9 @ssampang As mentioned in #210, I think @vyouman identified the problem. The latest commit should fix it. Let me know if there are any further issues.