Task 3.1

This is a well thought-out implementation! It's great to see that you've come up with a strategy that is able to beat an opponent with random moves 95% of the time. Your approach of using a decision tree to guide your moves is great, and it's interesting to see how it adapts to different situations, such as the introduction of k. We had the same idea regarding the terminal condition of the game. Your strategy is simple yet effective, and I know well your creativity and problem-solving skills!

Task 3.2

The genome and crossover design are straightforward and effective, while the fitness evaluation method of 100 games against a random player provides reliable results. You could try to implement it against an optimal player.

The results of the training show that the algorithm came to the same conclusion as the creator's initial strategy, with a 95% win rate against a random player. It seems strange that you never ever win against the optimal. I think you could try to change the possible type of moves, based on the probability. Additionally, experimenting with different population sizes, offspring sizes, and mutation rates could potentially lead to better results, but I'm sure you did it already.

Task 3.3

The implementation of Minimax with memoization and alpha-beta pruning shows effort to optimize the algorithm. I will certainly take your implementation of memoization, and use it with a library such as pickle. The issue with the implementation performing worse than the vanilla one is intriguing and I suggest further investigation.

Overall, the average win rate of 76% against a random player is a good accomplishment, but it would be interesting to see how the algorithm performs with a different heuristic / depth. Minmax doesn't seem to work well with this type of game.

Task 3.4

The results of training the Reinforcement Learning agent in this task show promising results, with a good winrate against the optimal strategy. However, the performance against a random strategy is not as good, considering the weakness of playing randomly in this game.

One potential reason for the inconsistent learning and variability in performance could be the random initialization of the G-table and the lack of a good heuristic to guide the learning. Additionally, fine-tuning more the exploration-exploitation balance and adjusting the learning rate differently may improve the stability and consistency of the agent's learning.

Another approach to consider could be combining the Reinforcement Learning algorithm with other methods, such as Monte Carlo Tree Search.

You did a very good job!

feurode46 / computational-intelligence-2022

Peer review Lab3 (s303503) #7

Task 3.1

Task 3.2

Task 3.3

Task 3.4