DLR-RM / stable-baselines3

PyTorch version of Stable Baselines, reliable implementations of reinforcement learning algorithms.
https://stable-baselines3.readthedocs.io
MIT License
8.84k stars 1.68k forks source link

[Question] Pong environment with A2C not learning with example code #1917

Closed Tanis1304 closed 4 months ago

Tanis1304 commented 5 months ago

❓ Question

I copied the code from the Examples section in the documentation, which also uses a PongNoFrameskip-v4 environment with 4 stacked frames. The episodic mean reward starts out around -20, but then worsens, after which it fluctuates between -21 and -20.5. I use the default hyperparameters of the A2C CNN policy, as you can also tell from the code below.

vec_env = make_atari_env("PongNoFrameskip-v4", n_envs=4, seed=0)
vec_env = VecFrameStack(vec_env, n_stack=4)

model = A2C("CnnPolicy", vec_env, verbose=1, device='cuda')
model.learn(total_timesteps=10000000)

I'm running this code using Python 3.10.4 and torch 2.3.0. What could be going wrong here, and shouldn't this example code just work?

Checklist

araffin commented 5 months ago

if you want the correct hyperparameters for Atari, you should use the RL Zoo. The example in the doc is there to show the api, we kept it concise to focus on the wrappers we provide.