Matteo-Pietro-Pillitteri / Computational-Intelligence

Repository CI 23/24
MIT License
0 stars 0 forks source link

Lab 10 Review #8

Open FestaShabani opened 9 months ago

FestaShabani commented 9 months ago

Hello Matteo,

Your code is well-structured and easy to follow, however I would appreciate some more comments. Your README.md is very informative and the effort you put in it impressed me very much. I appreciate that you visualized the results with a graph. The only thing that I found odd is the update rule for the Q-value which penalizes future rewards (-1.0 *np.max(Q_S_t_next)). I thought that in Q-learning, the goal is to maximize future rewards and not penalize them.. so to add the discounted maximum future reward rather than subtracting it.

Best of luck!