Closed maximecb closed 5 years ago
@nithin127 also take a look at #36 ; we need to scale the rewards (esp. for obstacle avoidance) between lane following and collision detection, and it's pretty much gonna be a HP search to find the right balance for any agent that has the desired behavior. ( I've got 0 intuition about what the mix should be, and every paper I've seen doing something similar has totally different reward structures.)
@nithin127 Bhairav tells me you've gotten started on this and run into issues?
Yes, I'm getting a -ve 1000 reward/episode throughout the training; Following things are in the pipeline:
What is your observation? Is it just a single image?
On Wed, Jul 18, 2018 at 4:41 PM, Nithin Vasisth notifications@github.com wrote:
Yes, I'm getting a -1000 loss/episode throughout the training; Following things are in the pipeline:
- Work on a simple case of continuous control with fixed forward velocity and only heading as variable
- Initialise the network with pre-trained imitation learning weights
- Switch to ppo, as ddpg is a little unreliable (looking into repositories)
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/duckietown/gym-duckietown/issues/25#issuecomment-406067220, or mute the thread https://github.com/notifications/unsubscribe-auth/AKBGMXcy_PXdTbsVYj5JVua-05IIE1Ceks5uH51sgaJpZM4VEiLm .
Also, that, will stack images..
For lane following on maps without obstacles, I was able to get it to work with single images. I would try to get the simplest thing working before moving on to more sophisticated models with frame stacking and such. I would also try using a very simple map like small_loop
. Fixing the velocity (eg at 0.5) and varying only the heading is a good idea.
Get "it" to work as in, imitation learning, or RL with discrete actions?
Both imitation learning and RL with discrete actions worked for me on lane following tasks without obstacles. Sidenote: I was using the differential drive control model (no HeadingWrapper).
I just verified and it does train successfully on small_loop
with discrete actions and 8 processes:
python3 pytorch_rl/main.py --no-vis --env-name Duckie-SimpleSim-Discrete-v0 --algo a2c --lr 0.0002 --max-grad-norm 0.5 --num-steps 20 --num-processes 8
Not sure what the status of this is - if there are major updates we can reopen.
There is an implementation of A2C in the
pytorch_rl
directory. Unfortunately, it's rather complex, it doesn't deal with continuous actions, and it sometimes learns bad policies. I would like to have an RL baseline which supports continuous actions. It doesn't necessarily have to be DDPG, could also be TRPO or something else. I would prefer PyTorch, and it would be particularly cool if we could find a codebase that is on the smaller size, easier to understand.On top of finding the RL code, we also need to implement/train a model. I would recommend starting from the parameters of the model in
experiments/train_imitation.py
. We should test training on multiple maps. There is aMultiMap-v0
gym environment which randomly samples all the maps. Theloop_obstacles
map may be particularly difficult to solve.@nithin127