chainer / chainerrl

ChainerRL is a deep reinforcement learning library built on top of Chainer.
MIT License
1.17k stars 226 forks source link

Some agents with online updates fail when used with step_offset of train_agent #135

Open toslunar opened 7 years ago

toslunar commented 7 years ago

The PCL implementation causes a minor error when it is used with step_offset, because of

if self.t - self.t_start == self.t_max:

in pcl.py, while self.t is overwritten by train_agent. To be precise, assert self.t_max is None or self.t - self.t_start <= self.t_max will fail at the beginning of PCL.update_on_policy (https://github.com/toslunar/chainerrl/commit/f8d07b385d11cd63aea03558cfc4eb1db632d370).

The implementations of A3C and ACER seem to have the same issue if trained by train_agent instead of train_agent_async.

muupan commented 7 years ago

Good catch. The problem comes from the fact that resuming agent training via step_offset is not well tested.