btaba / intro-to-rl

coding examples to Intro to RL
MIT License
12 stars 6 forks source link

question on training process of racetrack problem #1

Closed xubo92 closed 7 years ago

xubo92 commented 7 years ago

@btaba hi bro: I am on the task of racetrack problem in Sutton's RL book with the method of on-policy monte carlo. However, I find it hard to assess the performance of training process. Is there any parameter which can reflect the performance of training process? I also tried to make a movie but found that the training process is hard to assess in that way. I cannot ensure the relationship between episode number and training performance. For instance. if your performance looks bad, how to tell that the key is lack of episode number or just your wrong implementation of algorithm? Very confused about those questions. Do you have any ideas? Look forward to your reply!

btaba commented 7 years ago

Hi lvlvlvlvlv, to asses performance of the training process you can take the average return per training episode, and plot that over time. Although with the MC method you are not ensured that the returns monotonically increase, you should generally see the average return increasing by training episode. Alternatively you can make an agent which performs random actions, and plot the average return of that agent over time. Then you can compare the random-agent plots with the MC-agent to assess performance. Hope that answers your question!

xubo92 commented 7 years ago

@btaba Thanks for your reply! Do you mean that the average return per episode is the total sum of reward in that episode from beginning to ending?

btaba commented 7 years ago

You can take the sum of rewards or the average, whichever is more meaningful for the environment (from beginning to end of one episode). For example, if all of your rewards are always -1, then the average will always be -1, so taking the sum is more meaningful in that case.

xubo92 commented 7 years ago

@btaba Cool! very appreciate about your suggestion! I will try it and see what happens. Thank you again!

btaba commented 7 years ago

No problem!