duckietown / gym-duckietown

Self-driving car simulator for the Duckietown universe
http://duckietown.org
Other
52 stars 19 forks source link

Pytorch RL learns bad policy with default paramters #215

Open thomas-w-nl opened 4 years ago

thomas-w-nl commented 4 years ago

Has anyone successfully trained a good policy with the current default parameters using the RL template? After training multiple times for over 1 million timesteps (60 000 episodes) the only policy that has been learned is to turn in a circle. I tried using the SteeringToWheelVelWrapper to learn only heading with a fixed velocity of 0.5 but this did not fix the issue. I also tried to limit the number of rotations allowed, resetting the gym after more than 4* 360 deg of angle difference. However none of these approaches work. Should I train much longer or is something broken?

Looking at the reward over time for a run of over 4000 episodes, training more would appear not to result in anything useful. (Validation reward is in blue) image

thomas-w-nl commented 4 years ago

After training for 30 000 episodes on "straight_road", with the robot always starting in the ideal position it still does not learn to drive forwards and always turns straight off the map. Is something broken?

liampaull commented 4 years ago

I have also mostly seen this behavior. @bhairavmehta95 or @velythyl might have more insight.

Velythyl commented 4 years ago

I had the same behaviour when I started working on that repo.

I noticed a few bugs in the code, and have fixed them. I was going to do a PR, but I turned my attention to imitation learning lately so I didn't finish it.

I'll get on it tomorrow, I still have to clean up my code and commits but I should be able to open the PR either tomorrow or the day after (I'll have to train it to make sure everything works, and that takes a lot of time).

With the fixes, the car converges to either turning in a circle or going straight (it has a really hard time with curves). I didn't test it with a really long training time because of hardware constraints though, so that might be it.

Velythyl commented 4 years ago

It's still training, but just to be sure I've fixed the faulty behaviour - does this seem better to you? Right now it's only at 60k timesteps, so I'm sure with more computing power it could become better. It's not just turning in circles anymore, though it still likes turning more than going straight (but again, I think with more training that quirk might disappear).

It feels consistent with a "very early RL training that's still exploring the action space" model.

fixedGif

If this seems good I will open the PR.

Velythyl commented 4 years ago

Okay, I just saw it take two turns in a row at 90k timesteps, so I'll consider this fixed. I'll open the PR.

thomas-w-nl commented 4 years ago

That looks a lot more promising! In my experience the algorithm very quickly learns a max turning angle and really does not want to change. Im very interested in the fixes, thanks a lot for your help. Ill start training for the night, it should do at least 1m timesteps by tomorrow then.

After experimenting a little further, this code seems to deviate from the original DDPG by taking as many gradient steps as there were timesteps in the episode, however that does seem a little excessive, and it did perform better after i reduced the number of gradient steps per episode.

Velythyl commented 4 years ago

I don't have the rights to link issues or assign reviewers, but here is the PR: https://github.com/duckietown/challenge-aido_LF-baseline-RL-sim-pytorch/pull/33

Max-Fu commented 3 years ago

Hi! I am trying to train the ddpg model (with cnn) in the 'Duckietown-loop_pedestrians-v0' environment; however, it took roughly a day to get to 1434 step, which is far away from the 90k steps mentioned above. Is that natural? Currently it is running on a 1080ti.

Velythyl commented 3 years ago

No, that is not natural. Are you sure that it's using your 1080ti and not your CPU? On a 2080ti, it gets to 1500 extremely quickly, less than an hour iirc

Max-Fu commented 3 years ago

Using nvidia-smi I get that the program uses 1696MiB with batch size 32. I definitely think that it is on the small side.

Velythyl commented 3 years ago

I was about to launch a run, I can ping you with the time it took me to reach 1500 training steps for you to use as reference

Additonnally, it could be a good idea to do something like print("cuda" if torch.cuda.is_available() else "cpu")

Max-Fu commented 3 years ago

Thanks! I was testing that line and I got cuda. Just to make sure, step is referring to "total_timesteps."

Velythyl commented 3 years ago

Okay so I got to 1500 total timesteps in about 3 minutes, just so you know what to aim for

Can you print me your duckietown gym version and the version of this repo using git branch? Both should be daffy

Max-Fu commented 3 years ago
git branch is 
* daffy
(END) 

Can you explain how to get the duckietown gym version? I am somewhat new to gym.

Velythyl commented 3 years ago

Ah wait I just realized we're in the Gym repo and not the RL repo. Try following this instead, and let me know if you run into any issues https://docs.duckietown.org/daffy/AIDO/out/embodied_rl.html

This will walk you through installing the gym and uses an updated RL training script compared to what's in the gym-duckietown repo.

Max-Fu commented 3 years ago

Ah that make sense. I will update asap. Thanks!

Max-Fu commented 3 years ago

Problem fixed. Thanks @Velythyl

Max-Fu commented 3 years ago

@Velythyl I post a new issue (here). I think the key reason why it happens to be the case is that we might have a wrong reward function. I have restarted training with a temporary fix to this, and will let you know if this problem is fixed.

SebaVGit commented 3 years ago

Hi @Velythyl I have a question about the RL repo. In this code there is this line (114) if action[0] < 0.001: #Penalise slow actions: helps the bot to figure out that going straight > turning in circles reward = 0 I can not understand what is doing this. On the other hand when I try to run this RL with about 500000 steps, it just spun arround and do not go straigh anytime. What could be happening?