Element-Research / rnn

Recurrent Neural Network library for Torch7's nn
BSD 3-Clause "New" or "Revised" License
939 stars 313 forks source link

Locator seems not work? #303

Closed OptimusPrimeCao closed 7 years ago

OptimusPrimeCao commented 8 years ago

hi,@nicholas-leonard I'm trying to train my own dataset using your recurrent-attention-model structure. The dataset has 28*28 size and contains formula characters like [0-9],[a-z],(,),-,+, etc. However, after 300 epoches, these are what I get: image image image glimpse_14 image image

For all characters, after 2 steps or so, the locator always go to corner and stop predicting new glimpse locations.It seems that the locator dosen't work well. Is there something I should change with respect to your RAM structure?

What should I do in this particular case? Any suggestions? Thank you

nicholas-leonard commented 8 years ago

Hi, your glimpseSize seems really small. This is what I used to train the recurrent attention model on MNIST which is also 28x28:

th scripts/rnn-visual-attention.lua --cuda --useDevice 2 --rho 7 --rewardScale 1 --maxEpoch 2000 --maxTries 200 --learningRate 0.01 --sensorDepth 1 --momentum 0.9 --maxOutNorm -1 --batchSize 20 --saturateEpoch 800 --locatorStd 0.11 --uniform 0.1 --hiddenSize '{256}' --unitPixels 13 --glimpsePatchSize 8

The unitPixels argument is important as it prevents the model from getting stuck at the borders and corners.

OptimusPrimeCao commented 8 years ago

Thank for your reply! Do u think a glimpse with 2scales may work better in this case? or just a glimpse with 1scale, but a larger size?

nicholas-leonard commented 8 years ago

@OptimusPrimeCao Since your problem is so similar to MNIST, I would use exactly:

th scripts/rnn-visual-attention.lua --cuda --useDevice 2 --rho 7 --rewardScale 1 --maxEpoch 2000 --maxTries 200 --learningRate 0.01 --sensorDepth 1 --momentum 0.9 --maxOutNorm -1 --batchSize 20 --saturateEpoch 800 --locatorStd 0.11 --uniform 0.1 --hiddenSize '{256}' --unitPixels 13 --glimpsePatchSize 8

So a glimpse depth of 1 and a size of 8 pixels.

milkfish1988 commented 7 years ago

I would like to ask how to get the output location of the locator when doing evaluation? How to write the code? Thanks!