btaba / intro-to-rl

coding examples to Intro to RL
MIT License
12 stars 6 forks source link

questions on Q-learning #4

Closed xubo92 closed 7 years ago

xubo92 commented 7 years ago

hi @btaba : Have you tried Q-learning on the task like racetrack? I think the problem like monte-carlo off-policy algorithm on this kind of task should not gonna happen with Q-learning. But the performance is not good. I limit the episode max time steps to 200. I don't know if it is where the problem lies. Is it suitable to wait until Q converge for every episode and move to next episode later? Do this kind of task like racetrack not suitable for off-policy algorithms even Q-learning just because it is hard to gurantee the absolute ending state? Do you have any ideas for this situation? here is the code: look forward to your reply. Thank you~

btaba commented 7 years ago

I have not tried Q-learning with this environment. I didn't run your code either, but I don't see why Q-learning wouldn't work here since the policy being executed is stochastic and it is being updated at every step.

xubo92 commented 7 years ago

OK maybe i need check my code once again. Thank you

发自网易邮箱大师 On 08/03/2017 09:38, btaba wrote:

Closed #4.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.