Element-Research / rnn

Recurrent Neural Network library for Torch7's nn
BSD 3-Clause "New" or "Revised" License
938 stars 314 forks source link

About the "baseline" of REINFORCE in RAM #398

Closed ruizhaogit closed 7 years ago

ruizhaogit commented 7 years ago

Hi @nicholas-leonard ,

I am Ray, currently looking into your code recurrent-visual-attention.lua. It is a nice work! I am a little confused about this part:

-- add the baseline reward predictor seq = nn.Sequential() seq:add(nn.Constant(1,1)) seq:add(nn.Add(1)) concat = nn.ConcatTable():add(nn.Identity()):add(seq) concat2 = nn.ConcatTable():add(nn.Identity()):add(concat)

It seems that the baseline is a constant value (nn.Constant) plus some bias (nn.Add) in your implementation. But in the paper "Recurrent Modelof Visual Attention" page 5, they say that baseline "b_t = E_pi[R_t]". This is different from your code, right? Could you please give a little bit explanation about this difference?

Thank you!

Best, Ray