Closed kargarisaac closed 4 years ago
OK thanks for the update clarifications!
The one thing I can think of that's missing from your configuration is to do with the epsilon-greedy schedule. Look in the EpsilonGreedyAgent (a base class for DqnAgent): https://github.com/astooke/rlpyt/blob/master/rlpyt/agents/dqn/epsilon_greedy.py and check whether you are providing a good schedule for epsilon, and good starting and ending values?
You'll probably also want to increase min_steps_learn
to something like 1e3 or 1e4, to populate the replay buffer with lots of random samples before starting to learn. Or whatever setting you used in your other implementation?
Another thing would be to use MinibatchRlEval
instead of MinibatchRl
for the runner. Only the "eval" one will pause training to run the agent with a different value for epsilon and report those scores.
Let us know if either of those help?
This is interesting, I haven't actually run cartpole myself, would be good to see what settings work.
FWIW from my experience with CartPole I'm not actually sure if DQN does well at that. DQN seems to strangely be more reliable on Pong than on CartPole, but I might not have settled on ideal hyperparameters. I usually verify DQN code by running on Pong.
I tried several configurations for cartplole, but it didn't learn. Finally, I decided to test pong using custom agent and model, not from rlpyt, to see if my code is wrong. I just used the resized rgb image as input and the same configuration for sampler, algo, and runner but again without any sign of learning. It seems that my code has a problem that I cannot find it. Here is my code. I would be grateful if you can take a look at that.
Finally, the problem is solved. The replay buffer setting is not correct for DQN and non-frame environments. It sets it to frame versions, but when I set it to UniformReplayBuffer, it works perfectly. I will clean the code and add one example and make a pull request.
I'm trying to use rlpyt with my custom env with a non-image input state. For that, I first want to test it on a simple env, like CartPole-v0. And I use DQN and DqnAgent. But I get this error:
The code is:
But the ModelCls is None in DqnAgent and that's the reason for the error, I think. So I wrote an agent and model like this and used it instead of DqnAgent:
It runs fine, but It doesn't learn.
The plot is similar even after 20,000,000 steps. I checked the code several times and tested different configs for several days.
Do you have any idea to solve this problem?
Update: The CartPole-v0 has discrete action space. +1 or -1 for action (two actions). DDPG and SAC work fine for my custom env with continues action space. I try to discretize the action space. I trained it using DQN from stable baseline and my pure pytorch implementation and it works. But I couldn't train it using rlpyt and decided to first try on CartPole-v0. Do you see any problem in my code for an env with a discrete action space?