This repository contains all the materials and documentation related to my experiences and projects in the Computational Intelligence course. As a student, I was deeply engaged with the course material, which explored various techniques and approaches for creating intelligent systems.
I love the trial-and-error way in which you developed your fixed-rules solution. I am curious how the final version of count_and_decide would perform against the optimal strategy.
Task 2
Your solution reached immediately a high fitness value and then stopped growing: the typical early convergence problem. This can be easily solved by increasing the population size and using a (mu, lambda) EA with lambda > 5mu*. If you want more exploration than that, you could also implement some quick diversity promotion strategy like extinction.
Task 3
Great implementation, nothing to say. The strange phenomenon relative to the number of heaps might be due to the horizon effect: try to implement a Monte Carlo tree search when the minmax algorithm reaches interesting nodes.
Task 4
Given how the evaluate function works, the first player to move always use the strategy passed as the first argument. Thus, if your RL agent always does the best move, the optimal_strategy agent can't do anything but choose a random action and will perform similarly to the pure_random agent. Congrats, your RL agent learned the optimal strategy!
Task 1
I love the trial-and-error way in which you developed your fixed-rules solution. I am curious how the final version of
count_and_decide
would perform against the optimal strategy.Task 2
Your solution reached immediately a high fitness value and then stopped growing: the typical early convergence problem. This can be easily solved by increasing the population size and using a (mu, lambda) EA with lambda > 5mu*. If you want more exploration than that, you could also implement some quick diversity promotion strategy like extinction.
Task 3
Great implementation, nothing to say. The strange phenomenon relative to the number of heaps might be due to the horizon effect: try to implement a Monte Carlo tree search when the minmax algorithm reaches interesting nodes.
Task 4
Given how the
evaluate
function works, the first player to move always use the strategy passed as the first argument. Thus, if your RL agent always does the best move, theoptimal_strategy
agent can't do anything but choose a random action and will perform similarly to thepure_random
agent. Congrats, your RL agent learned the optimal strategy!