Hi Acme team, I think JAX DQN might set the evaluation epsilon to the exploration epsilon if deterministic evaluation is requested (eps=0.0, here). Replacing this with self._config.eval_epsilon is not None fixed it for me - does this also occur on your end?
Hi Acme team, I think JAX DQN might set the evaluation epsilon to the exploration epsilon if deterministic evaluation is requested (eps=0.0, here). Replacing this with
self._config.eval_epsilon is not None
fixed it for me - does this also occur on your end?Thanks so much for checking!