DDPG implementation / baseline

duckietown / gym-duckietown

Self-driving car simulator for the Duckietown universe

http://duckietown.org

Other

51 stars 19 forks source link

DDPG implementation / baseline #25

Closed maximecb closed 5 years ago

maximecb commented 6 years ago

There is an implementation of A2C in the pytorch_rl directory. Unfortunately, it's rather complex, it doesn't deal with continuous actions, and it sometimes learns bad policies. I would like to have an RL baseline which supports continuous actions. It doesn't necessarily have to be DDPG, could also be TRPO or something else. I would prefer PyTorch, and it would be particularly cool if we could find a codebase that is on the smaller size, easier to understand.

On top of finding the RL code, we also need to implement/train a model. I would recommend starting from the parameters of the model in experiments/train_imitation.py. We should test training on multiple maps. There is a MultiMap-v0 gym environment which randomly samples all the maps. The loop_obstacles map may be particularly difficult to solve.

@nithin127

bhairavmehta95 commented 6 years ago

@nithin127 also take a look at #36 ; we need to scale the rewards (esp. for obstacle avoidance) between lane following and collision detection, and it's pretty much gonna be a HP search to find the right balance for any agent that has the desired behavior. ( I've got 0 intuition about what the mix should be, and every paper I've seen doing something similar has totally different reward structures.)

maximecb commented 6 years ago

@nithin127 Bhairav tells me you've gotten started on this and run into issues?

nithin127 commented 6 years ago

Yes, I'm getting a -ve 1000 reward/episode throughout the training; Following things are in the pipeline:

Work on a simple case of continuous control with fixed forward velocity and only heading as variable
Initialise the network with pre-trained imitation learning weights
Switch to ppo, as ddpg is a little unreliable (looking into repositories)

bhairavmehta95 commented 6 years ago

What is your observation? Is it just a single image?

On Wed, Jul 18, 2018 at 4:41 PM, Nithin Vasisth notifications@github.com wrote:

Yes, I'm getting a -1000 loss/episode throughout the training; Following things are in the pipeline:

Work on a simple case of continuous control with fixed forward velocity and only heading as variable

Initialise the network with pre-trained imitation learning weights

Switch to ppo, as ddpg is a little unreliable (looking into repositories)

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/duckietown/gym-duckietown/issues/25#issuecomment-406067220, or mute the thread https://github.com/notifications/unsubscribe-auth/AKBGMXcy_PXdTbsVYj5JVua-05IIE1Ceks5uH51sgaJpZM4VEiLm .

nithin127 commented 6 years ago

Also, that, will stack images..

maximecb commented 6 years ago

For lane following on maps without obstacles, I was able to get it to work with single images. I would try to get the simplest thing working before moving on to more sophisticated models with frame stacking and such. I would also try using a very simple map like small_loop. Fixing the velocity (eg at 0.5) and varying only the heading is a good idea.

nithin127 commented 6 years ago

Get "it" to work as in, imitation learning, or RL with discrete actions?

maximecb commented 6 years ago

Both imitation learning and RL with discrete actions worked for me on lane following tasks without obstacles. Sidenote: I was using the differential drive control model (no HeadingWrapper).

maximecb commented 6 years ago

I just verified and it does train successfully on small_loop with discrete actions and 8 processes: python3 pytorch_rl/main.py --no-vis --env-name Duckie-SimpleSim-Discrete-v0 --algo a2c --lr 0.0002 --max-grad-norm 0.5 --num-steps 20 --num-processes 8

bhairavmehta95 commented 5 years ago

Not sure what the status of this is - if there are major updates we can reopen.