duckietown / gym-duckietown

Self-driving car simulator for the Duckietown universe
http://duckietown.org
Other
45 stars 16 forks source link

DDPG bad learned policy. #187

Closed AlexKaravaev closed 4 years ago

AlexKaravaev commented 4 years ago

I have trained DDPG code, that is presented in the repository for over 12 hours, but it learns very bad policy, which is just continous spinning on one place, without moving forward.

Can somebody please give me a hint, what should I change?

I am pretty new to RL, but my first thoughts are to either tune hyperparameters or to completely change reward function, though it seems very logical to me and composed in a way to get rid of such bad policies.

bhairavmehta95 commented 4 years ago

Are you running this on AIDO, or just using the enjoy_reinforcement.py script?

bhairavmehta95 commented 4 years ago

Unfortunately, with RL, "hours" is a bit of a vague term. How many timesteps did you train for?

AlexKaravaev commented 4 years ago

@bhairavmehta95 No, I just try to to train it without AI-DO and then run with enjoy_reinforcement.py I have trained it for almost 3000 episodes.

bhairavmehta95 commented 4 years ago

So if you're using train_reinforcement.py and then enjoy_reinforcement.py, it should work okay.

3000 episodes still seems a bit low, but if you can train again, can you remove the ActionWrapper from both files, retrain, and see what happens?

https://github.com/duckietown/gym-duckietown/blob/master/learning/reinforcement/pytorch/train_reinforcement.py#L32

AlexKaravaev commented 4 years ago

Just for the clarification, it will result in something different? Because as I understood from code ActionWrapper just constrains the speed to 0.8 of the action speed and that's does not seem to be the case of which bot is just spinning around.

bhairavmehta95 commented 4 years ago

Well, it used to be the case that we did:

action = [steering, velocity]

but now we do

action = [left_velocity, right_velocity]

So this may cause that effect, not sure though.

I still think you'd need to train longer though. Its a tough environment to solve.

AlexKaravaev commented 4 years ago

Alright, thanks. Will come back to you after tweaking this up a little bit.

bhairavmehta95 commented 4 years ago

Reopen if you have more issues!

AlexKaravaev commented 4 years ago

I have trained for 3000 steps without actionwrapper and now bot is just stuck on one place. Have you ever managed to get something, that works with pure ddpg?

Still not sure though, if actionwrapper needs to be removed, because with actionwrapper it learnt at least how to turn, now it is just stays at one place.

Now after looking into several papers, I might think that it is ddpg, that takes enormous amount of epochs to train and maybe adding some improvement like hindsight experience replay could possibly improve something?

bhairavmehta95 commented 4 years ago

We've gotten it to work, but it needs way more steps than what you've been training for.

Maybe increase the exploration period.

On Fri, Nov 29, 2019, 3:00 AM AlexKaravaev notifications@github.com wrote:

I have trained for 3000 steps without actionwrapper and now bot is just stuck on one place. Have you ever managed to get something, that works with pure ddpg?

Still not sure though, if actionwrapper needs to be removed, because with actionwrapper it learnt at least how to turn, now it is just stays at one place.

Now after looking into several papers, I might think that it is ddpg, that takes enormous amount of epochs to train and maybe adding some improvement like hindsight experience replay could possibly improve something?

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/duckietown/gym-duckietown/issues/187?email_source=notifications&email_token=ACQEMMJR63AYJN67OTGPAW3QWDD2XA5CNFSM4JSFBT7KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEFOEWCI#issuecomment-559696649, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACQEMMLYTW6BH67RP3VFRLLQWDD2XANCNFSM4JSFBT7A .