The ExpertSystem seems fine, you implemented the nim sum strategy correctly.
Personally, I created a custom solution that exploit the nim sum strategy only for a given probability; others implemented similar strategies that contains one or more parameters to be later included in the GA. So, my suggestion is to play by creating more agent with different rules: some of these can mimic the human behaviour, like removing elements in some specific row, etc...
Task 3.2: An agent using evolved rules
I really like your implementation, by running the algorithm a couple of times, the genetic agent has an high winning ratio!
The only thing that you can add to the algorithm is a crossover function, that can take parameters from two different individual.
Task 3.3: An agent using minmax
The algorithm computational complexity can be decreased by implementing alpha beta pruning.
Here it is a possible implementation (remember to include alpha=-1 and beta=1 as arguments of the minmax function):
[...]
# after calling the minmax function recursively
if minimizing:
beta = min(beta, val)
else:
alpha = max(alpha, val)
if beta <= alpha:
break
[...]
Another thing that can improve this agent is implementing a montecarlo strategy when the depth exceed the maximum.
Task 3.4: An agent using reinforcement learning
I really like your reinforcement learning agent implementation, and it is kind of similar to mine.
In my implementation, I found that a reward of 1 if the agent won after that move and 0 otherwise, instead of (0, -1), gave me extremely better results. In particular, I found that the winning ratio against the random player went from ~0.7 to 0.92
The matches and the README
I like how you divided between the lib and the matches in the jupyter notebook. It would have been interesting to compute a lot of matches and then compute a winning ratio for each agent. The results could have been included in the README file, along with one or more table to hold them.
Overall, I really enjoyed your solution, so good job!
Task 3.1: An agent using fixed rules based
The
ExpertSystem
seems fine, you implemented the nim sum strategy correctly.Personally, I created a custom solution that exploit the nim sum strategy only for a given probability; others implemented similar strategies that contains one or more parameters to be later included in the GA. So, my suggestion is to play by creating more agent with different rules: some of these can mimic the human behaviour, like removing elements in some specific row, etc...
Task 3.2: An agent using evolved rules
I really like your implementation, by running the algorithm a couple of times, the genetic agent has an high winning ratio!
The only thing that you can add to the algorithm is a crossover function, that can take parameters from two different individual.
Task 3.3: An agent using minmax
The algorithm computational complexity can be decreased by implementing alpha beta pruning. Here it is a possible implementation (remember to include
alpha=-1
andbeta=1
as arguments of theminmax
function):Another thing that can improve this agent is implementing a montecarlo strategy when the depth exceed the maximum.
Task 3.4: An agent using reinforcement learning
I really like your reinforcement learning agent implementation, and it is kind of similar to mine.
In my implementation, I found that a reward of 1 if the agent won after that move and 0 otherwise, instead of (0, -1), gave me extremely better results. In particular, I found that the winning ratio against the random player went from ~0.7 to 0.92
The matches and the README
I like how you divided between the lib and the matches in the jupyter notebook. It would have been interesting to compute a lot of matches and then compute a winning ratio for each agent. The results could have been included in the README file, along with one or more table to hold them.
Overall, I really enjoyed your solution, so good job!