Closed oribarel closed 5 years ago
Hi, your implementation of A3CLSTMSoftmax
looks correct.
Namely, should the observation contain history of previous observations or everything is handled for me by Chainer / ChainerRL?
This completely depends on what environment you use. It is the environment, not Chainer or ChainerRL, that determines what information is contained in an observation.
what is the argument t-max used for?
t_max
is the length of rollouts used for A3C's updates, defined in Algorithm S3 in https://arxiv.org/abs/1602.01783.
Hi,
I'm trying to combine the
A3CLSTMGaussian
andA3CFFSoftmax
examples to anA3CLSTMSoftmax
architecture. Is the following the right way to go? Would you change something?BTW, If I managed to use
A3CFFSoftmax
successfully, should I change something in the observations? Namely, should the observation contain history of previous observations or everything is handled for me by Chainer / ChainerRL? One more question, what is the argument t-max used for?