It seems that the baseline is a constant value (nn.Constant) plus some bias (nn.Add) in your implementation.
But in the paper "Recurrent Modelof Visual Attention" page 5, they say that baseline "b_t = E_pi[R_t]".
This is different from your code, right? Could you please give a little bit explanation about this difference?
Hi @nicholas-leonard ,
I am Ray, currently looking into your code recurrent-visual-attention.lua. It is a nice work! I am a little confused about this part:
-- add the baseline reward predictor seq = nn.Sequential() seq:add(nn.Constant(1,1)) seq:add(nn.Add(1)) concat = nn.ConcatTable():add(nn.Identity()):add(seq) concat2 = nn.ConcatTable():add(nn.Identity()):add(concat)
It seems that the baseline is a constant value (nn.Constant) plus some bias (nn.Add) in your implementation. But in the paper "Recurrent Modelof Visual Attention" page 5, they say that baseline "b_t = E_pi[R_t]". This is different from your code, right? Could you please give a little bit explanation about this difference?
Thank you!
Best, Ray