Tensorflow BOOT DQN agent loses performance after first iteration

Hi,

I am observing a strange behavior by the tensorflow default boot dqn agent that I am a bit baffled by. When running sweeps over multiple environments, the agent loses its expected behavior after the first iteration and does not seem to explore. I've tried to debug for some time but haven't figured out the cause.

Code for reproduction (double-checked in a newly installed env):

import bsuite
from bsuite.baselines.tf import boot_dqn
from bsuite import sweep
from bsuite.baselines import experiment

bsuite_id = "DEEP_SEA"
log_dir = "./logs/"
bsuite_sweep = getattr(sweep, bsuite_id)[:3]

for id in bsuite_sweep:
    env = bsuite.load_and_record(id, save_path=log_dir, overwrite=True)
    agent = boot_dqn.default_agent(
        obs_spec=env.observation_spec(),
        action_spec=env.action_spec(),
    )

    experiment.run(agent, env, num_episodes=300)

Iterations 2 and 3 do not reach the end of the chain in 300 episodes and neither in very long training horizons (see also the colab link for results).

In contrast, the jax agent produces the expected results reliably in this loop (i.e., by replacing with ).

The same can be observed in colab: https://colab.research.google.com/drive/1hnJMDLG-aXCKKsjFqVd6YWGY4luz29ku?usp=sharing

best, anyboby

google-deepmind / bsuite

Tensorflow BOOT DQN agent loses performance after first iteration #46