Closed beatrice-occhiena closed 1 month ago
Hello Beatrice ππ». Thank you for your detailed review of my work.
I appreciate your suggestion about adding visual representations of the training process, I'll try to include them in my future works or in the final project.
The reason I decided to include a negative reward for taking invalid moves is to see how well the agent was able to learn that it shouldn't play them, and how much time it took to do that. Implementing your solution will certainly make the final agent better, but I wanted to implement it the way I did just for experimental reasons.
I hope I have satisfied your perplexity and good luck with your final project and exam ππ!
Hi Davide π,
I just finished reviewing your project, and I must say, I am thoroughly impressed! Here are my thoughts:
Theoretical Introduction
Your introduction to Q-Learning and Monte Carlo strategies is commendable. It provides a solid foundation for understanding the rest of your project.
Code Organization
I really appreciated the idea of using an abstract class for different player strategies. As I stated in my notebook, I was so inspired by this approach that I've adopted the same organization in my own project!
Comprehensive Comments
Your comments make it easy to understand the purpose and functionality of each section, which is incredibly helpful not just for reviewers like me, but for anyone who wishes to learn from or build upon your work.
Statistical Analysis and Visualization
The function you implemented to collect game statistics is very useful. It provides valuable insights into the performance of the strategies over time.
However, I suggest adding a graph to visually represent the training trend over time, including wins, losses, and draws. This would not only add to the visual appeal but also make the learning process and performance trends more immediately evident.
Impressive Results
The results you've achieved with both players are fantastic. It's clear that your strategies are effective and well-implemented.
Comparative Analysis
Your final comparison between the two strategies (Q-Learning and Monte Carlo) is a great way to wrap up your project. It gives a comprehensive view of the strengths and weaknesses of each approach and provides valuable insights into their practical applications.
A Small Perplexity
The only area where I have some reservations is regarding the negative reward for invalid moves. I understand the reasoning behind it, but I wonder if it might be more efficient to avoid invalid moves altogether with a preliminary check. Of curse, this would be a more deterministic and rule-based approach, but I think it could streamline the learning process by preventing the agent from exploring obviously undesirable actions, given the well-defined rules of the game. However, I acknowledge that I might be missing some aspects of your strategy here.
Overall, your project demonstrates your skills and understanding of the topic. Awesome job! β¨