Closed AlexKaravaev closed 4 years ago
Are you running this on AIDO, or just using the enjoy_reinforcement.py
script?
Unfortunately, with RL, "hours" is a bit of a vague term. How many timesteps did you train for?
@bhairavmehta95 No, I just try to to train it without AI-DO and then run with enjoy_reinforcement.py I have trained it for almost 3000 episodes.
So if you're using train_reinforcement.py
and then enjoy_reinforcement.py
, it should work okay.
3000 episodes still seems a bit low, but if you can train again, can you remove the ActionWrapper
from both files, retrain, and see what happens?
Just for the clarification, it will result in something different? Because as I understood from code ActionWrapper just constrains the speed to 0.8 of the action speed and that's does not seem to be the case of which bot is just spinning around.
Well, it used to be the case that we did:
action = [steering, velocity]
but now we do
action = [left_velocity, right_velocity]
So this may cause that effect, not sure though.
I still think you'd need to train longer though. Its a tough environment to solve.
Alright, thanks. Will come back to you after tweaking this up a little bit.
Reopen if you have more issues!
I have trained for 3000 steps without actionwrapper and now bot is just stuck on one place. Have you ever managed to get something, that works with pure ddpg?
Still not sure though, if actionwrapper needs to be removed, because with actionwrapper it learnt at least how to turn, now it is just stays at one place.
Now after looking into several papers, I might think that it is ddpg, that takes enormous amount of epochs to train and maybe adding some improvement like hindsight experience replay could possibly improve something?
We've gotten it to work, but it needs way more steps than what you've been training for.
Maybe increase the exploration period.
On Fri, Nov 29, 2019, 3:00 AM AlexKaravaev notifications@github.com wrote:
I have trained for 3000 steps without actionwrapper and now bot is just stuck on one place. Have you ever managed to get something, that works with pure ddpg?
Still not sure though, if actionwrapper needs to be removed, because with actionwrapper it learnt at least how to turn, now it is just stays at one place.
Now after looking into several papers, I might think that it is ddpg, that takes enormous amount of epochs to train and maybe adding some improvement like hindsight experience replay could possibly improve something?
— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/duckietown/gym-duckietown/issues/187?email_source=notifications&email_token=ACQEMMJR63AYJN67OTGPAW3QWDD2XA5CNFSM4JSFBT7KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEFOEWCI#issuecomment-559696649, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACQEMMLYTW6BH67RP3VFRLLQWDD2XANCNFSM4JSFBT7A .
I have trained DDPG code, that is presented in the repository for over 12 hours, but it learns very bad policy, which is just continous spinning on one place, without moving forward.
Can somebody please give me a hint, what should I change?
I am pretty new to RL, but my first thoughts are to either tune hyperparameters or to completely change reward function, though it seems very logical to me and composed in a way to get rid of such bad policies.