Hello there, Lorenzo!
Your code was very well written and well documented in the report.
I liked how you use different agents trained against each other. I think the reason behind rl_base outperforming rl_RL_trained is that the former, being trained only against random agents, chooses more "reckless" moves, that favor it against a random player, on the contrary I expect rl_RL_trained to be better against a real player.
Counter-intuitevely the_ROCK outperforms even rl_base, I think this is because it retains some important knowledge obtained while playing against rl_RL_trained that help it win in a more consistent manner by applying some "rule-of-thumb" rule learned during training (like prioritizing corners rather than sides).
I think it would be interesting to see how many moves does it take on average for each agent to win to determine how aggressive/defensive each one is.
I hope this review was useful to you and good luck on the final project. 😊
Hello there, Lorenzo! Your code was very well written and well documented in the report. I liked how you use different agents trained against each other. I think the reason behind
rl_base
outperformingrl_RL_trained
is that the former, being trained only against random agents, chooses more "reckless" moves, that favor it against a random player, on the contrary I expectrl_RL_trained
to be better against a real player. Counter-intuitevelythe_ROCK
outperforms evenrl_base
, I think this is because it retains some important knowledge obtained while playing againstrl_RL_trained
that help it win in a more consistent manner by applying some "rule-of-thumb" rule learned during training (like prioritizing corners rather than sides). I think it would be interesting to see how many moves does it take on average for each agent to win to determine how aggressive/defensive each one is. I hope this review was useful to you and good luck on the final project. 😊