Closed xubo92 closed 7 years ago
Off-policy on the racetrack environment I made won't work well, precisely because of that comment from Sutton's book. Off-policy means that you have some policy executing in your environment that is different than the policy you are optimizing. For example, you could have some random policy. You wouldn't expect a random policy to ever really finish an episode successfully in the racetrack environment, because it is too complicated, so the agent will never learn how to complete the racetrack.
@btaba sorry for my delayed reply :) The explanation is very reasonable. Thanks a lot.
hi @btaba: Have you ever try off-policy method on racetrack problem? I tried but found the performance is so bad. I found something that seems important in Sutton's book :