[Question] The actual training timesteps don't correspond with the hyper-parameters for Atari

cx441000319 commented 1 year ago

❓ Question

Hi,

As the title says, it seems the issue only occurs in Atari. Here are some commands and images for reference:

Experiment Command: python train.py --algo ppo --env PongNoFrameskip-v4

Training Plotting Command: python scripts/plot_train.py -a ppo -e PongNoFrameskip-v4 -f logs

Evaluation Plotting Command: python scripts/all_plots.py -a ppo -e PongNoFrameskip-v4 -f logs --no-million -max 10000000

We can tell the number of the training timesteps is about 4e7 instead of 1e7 (n_timesteps in the hyper-parameters). The issue doesn't exist in the environments except for Atari based on my experiment results. If you want to reproduce the same issue, you can simply replace the hyper-parameter n_timesteps with a small number like 1e4 and you will find there are much more than 1e4 samples according to the episodic lengths in the logs.

Thank you so much in advance!

Checklist

[X] I have checked that there is no similar issue in the repo
[X] I have read the SB3 documentation
[X] I have read the RL Zoo documentation
[X] If code there is, it is minimal and working
[X] If code there is, it is formatted using the markdown code blocks for both code and stack traces.

araffin commented 1 year ago

Hello, this is expected because of preprocessing for Atari games (the action repeat, aka frameskip, is set to 4 by default).

cx441000319 commented 1 year ago

Oh, that totally makes sense. I tried my best to check if there were any details I ignored, but I didn't realize it before. It's all clear now. Thank you so much for your quick reply!

DLR-RM / rl-baselines3-zoo

[Question] The actual training timesteps don't correspond with the hyper-parameters for Atari #367

❓ Question

Checklist