Backprop in evaluate mode

Element-Research / rnn

Recurrent Neural Network library for Torch7's nn

BSD 3-Clause "New" or "Revised" License

939 stars 313 forks source link

Backprop in evaluate mode #356

Closed ruotianluo closed 7 years ago

ruotianluo commented 7 years ago

I know it is memory efficient in evaluate mode. However, I need to backprop in evaluate model. I tried use remember('both'), however it doesn't help. What's the right way?

sudongqi commented 7 years ago

evaluate mode doesn't store the hidden state during forward (which is needed for backprop), so why don't use training mode?

ruotianluo commented 7 years ago

Consider nn.Sequencer(nn.Dropout()). How can I make it work correctly.

JoostvDoorn commented 7 years ago

If the only non deterministic module is Dropout for which you need to calculate gradients then you could disable Dropout and use training mode instead.

ruotianluo commented 7 years ago

@JoostvDoorn It should work. Thank you for your advice.

ruotianluo commented 7 years ago

But, still, there should be a switch to turn on and off the memory reduction. However I can't find where it is.

sudongqi commented 7 years ago

If you look into the Dropout implementation, there is a train flag. You can set a handle when you define the model, and call x.train = false to turn it off.

local model = nn.Sequential()
model:add(nn.Linear(50,50))
local x = nn.Dropout(0.5)
model:add(x)
x.train = false
print (x.train) --> should print false
model:training()
print(x.train) --> should print true

ywelement commented 7 years ago

Thanks, @JoostvDoorn, @sudongqi. You're right, setting the train flag is a quick way to enable backprop.

ywelement commented 7 years ago

Looks like we got a solution here. I'm closing this ticket.

ruotianluo commented 7 years ago

@ywelement is there a simple way to turn off the memory sharing?

ruotianluo commented 7 years ago

@JoostvDoorn @sudongqi Just turn dropout module into evaluate mode doesn't work. The memory sharing is in the wrapper. The way I did, is manually turn the p to 0 in evaluation so that the sharing won't change the result.