Closed LeZhengThu closed 1 month ago
Hello,
total_timesteps=2000
is not the number of iterations but the minimum number of steps in the env (you see in the logger iterations=1
).
I would recommend you to take a look at the RL Zoo and the tuned hyperparameters for PPO on CartPole, you need to let it train longer (at least 20_000 steps to have a behavior better than random).
And you should learn more about PPO (we have links to resources in our doc).
@araffin Hello, thanks for the sharing. I set total_timesteps to small numbers to check the functionality of my env. And I also get your point. When I set total_timesteps to 10000, the training is correct now. In addition, I wonder what's the difference between the parameters n_steps and n_epochs in PPO? And is this link '''https://stable-baselines3.readthedocs.io/en/master/modules/ppo.html''' the correct resource to learn PPO?
I set total_timesteps to small numbers to check the functionality of my env.
you have check_env()
for that (also documented)
And is this link '''https://stable-baselines3.readthedocs.io/en/master/modules/ppo.html''' the correct resource to learn PPO?
yes and https://stable-baselines3.readthedocs.io/en/master/guide/rl.html
❓ Question
I'm using gymnasium version 0.29.1 and stable_baselines3 version 2.3.2. I'm dealing with a customized env and find that model.learn is not learning anything. So I try to follow the easy examples with 'CartPole-v1' env. However, it seems that it is still not working. Below is the code.
I record the execution time and there is no much difference among training for 20, 200, 2000 iterations. This does make any sense. In addition, I don't know how to interpret the output as shown below. I can't tell if this tells me anything.
Checklist