Open Deepakgthomas opened 4 days ago
I also put up a SO post about it here - https://stackoverflow.com/questions/79083972/why-is-my-sb3-dqn-agent-unable-to-learn-cartpole-v1-despite-using-optimal-hyperp
Unfortunately, the score doesn't rise above 300.
Are you talking about the training reward (average over many episodes) or about the final performance using the (quasi)-deterministic policy?
How many runs did you do?
Did you try using the RL Zoo:
python -m rl_zoo3.train --algo dqn --env CartPole-v1 --eval-freq 10000 -P
A simple solution is to also increase the training budget.
📚 Documentation
I obtained optimal hyperparameters for training CartPole-v1 from RLZoo3. I have created a minimal example demonstrating the performance of my CartPole agent. As oer the official docs, the agent should obtain a score of 500, to have a successful episode. Unfortunately, the score doesn't rise above 300.
Here is my code -
Here is the final result -
Perhaps I am using RLZoo3 wrong? Anyways, I would truly appreciate any and all help regarding this.
Checklist