Open toslunar opened 7 years ago
The PCL implementation causes a minor error when it is used with step_offset, because of
if self.t - self.t_start == self.t_max:
in pcl.py, while self.t is overwritten by train_agent. To be precise, assert self.t_max is None or self.t - self.t_start <= self.t_max will fail at the beginning of PCL.update_on_policy (https://github.com/toslunar/chainerrl/commit/f8d07b385d11cd63aea03558cfc4eb1db632d370).
assert self.t_max is None or self.t - self.t_start <= self.t_max
The implementations of A3C and ACER seem to have the same issue if trained by train_agent instead of train_agent_async.
Good catch. The problem comes from the fact that resuming agent training via step_offset is not well tested.
step_offset
The PCL implementation causes a minor error when it is used with step_offset, because of
in pcl.py, while self.t is overwritten by train_agent. To be precise,
assert self.t_max is None or self.t - self.t_start <= self.t_max
will fail at the beginning of PCL.update_on_policy (https://github.com/toslunar/chainerrl/commit/f8d07b385d11cd63aea03558cfc4eb1db632d370).The implementations of A3C and ACER seem to have the same issue if trained by train_agent instead of train_agent_async.