DLR-RM / rl-baselines3-zoo

A training framework for Stable Baselines3 reinforcement learning agents, with hyperparameter optimization and pre-trained agents included.
https://rl-baselines3-zoo.readthedocs.io
MIT License
1.9k stars 495 forks source link

[Question] The actual training timesteps don't correspond with the hyper-parameters for Atari #367

Closed cx441000319 closed 1 year ago

cx441000319 commented 1 year ago

❓ Question

Hi,

As the title says, it seems the issue only occurs in Atari. Here are some commands and images for reference:

Experiment Command: python train.py --algo ppo --env PongNoFrameskip-v4 image

Training Plotting Command: python scripts/plot_train.py -a ppo -e PongNoFrameskip-v4 -f logs image

Evaluation Plotting Command: python scripts/all_plots.py -a ppo -e PongNoFrameskip-v4 -f logs --no-million -max 10000000 image

We can tell the number of the training timesteps is about 4e7 instead of 1e7 (n_timesteps in the hyper-parameters). The issue doesn't exist in the environments except for Atari based on my experiment results. If you want to reproduce the same issue, you can simply replace the hyper-parameter n_timesteps with a small number like 1e4 and you will find there are much more than 1e4 samples according to the episodic lengths in the logs.

Thank you so much in advance!

Checklist

araffin commented 1 year ago

Hello, this is expected because of preprocessing for Atari games (the action repeat, aka frameskip, is set to 4 by default).

Related: https://github.com/DLR-RM/stable-baselines3/issues/181

cx441000319 commented 1 year ago

Oh, that totally makes sense. I tried my best to check if there were any details I ignored, but I didn't realize it before. It's all clear now. Thank you so much for your quick reply!