Open lyx-x opened 6 years ago
When t_max
is set to None, PCL is expected to work as the original paper does, i.e. sampling a full episode.
When t_max
is set to non-None value, it is expected to work like ACER with PCL's loss function.
It seems the docstring of PCL is wrong, not only in t_max
but also in other points. We should fix this. Thank you for pointing it out.
My pull request is probably not what you expect, I added a new argument for limiting the length of episode. I should probably rename the argument.
By the way, the docstring of t_max in ACER may also be wrong.
According to the docstring below,
t_max
in PCL is equivalent toupdate_interval
in DQN.https://github.com/chainer/chainerrl/blob/4ca100cd8f0dab5e5e333cf3bdc65fb6e6e30651/chainerrl/agents/pcl.py#L49
However, if
t_max == None
by default (also in the example script), line 388 (below), the update, will never be executed.https://github.com/chainer/chainerrl/blob/4b051a4cc0e13634b9b0f521c0ad18db91019d40/chainerrl/agents/pcl.py#L388
In addition, when sampling from the replay buffer, I think that
max_len
should be set with another parameter, at least not witht_max
unless we change the docstring. (I also think there is no such limit on the length of episode in the original paper)https://github.com/chainer/chainerrl/blob/4b051a4cc0e13634b9b0f521c0ad18db91019d40/chainerrl/agents/pcl.py#L284