lab3 peer review by Maria Rosa Scoleri

Task 3.1 - An agent using fixed rules based on nim-sum (i.e., an expert system)

Your code here creates an agent that uses hard coded rules not based on nim-sum. I think that using a hard coded strategy is a very good idea to start understanding the problem, but after that you were supposed to create an expert system based on nim-sum so that it could be optimal. Your agent is not optimal since it considers only a few scenarios and it will lose against a nim-sum based agent.
Anyway, if we just consider the hard coded agent, I really like the idea that the agent can see one step ahead and check if a move helps the opponent. However, I find the code you wrote to implement this a bit confusing and redundant. There are a lot of loops and ifs and loops inside ifs... I think at least a few of them are unnecessary. For example, in iterative_loop where you compare your Evaluator with the masks, you could handle all those loops in a single one with different returns (since the masks don't change in the loops). Something like:

for r, c in enumerate(state.rows):
    if Evaluator==mask1 and c==1:  
        return Nimply(r,c)
    if ...

In addition, if you have two if conditions in a row without any code in the middle (like in the very beginning of iterative_loop) you can just write if condition1 and condition2: ...
Little things like these can make your code a bit clearer and also probably more efficient.

Task 3.2 - An agent using evolved rules

I really like the way you handled your genome, the alpha and beta parameters, and your mutation and crossover functions. Your rules are probably a bit too simple but I understand the difficulty, especially with this task of the lab. You could try to use other kinds of rules and also add the rules you wrote in the previous task.

You need to be careful with all your ifs that are too often without an else and can lead to some problems. Let's see for example the evolvable function:

def evolvable(state: Nim, genome: tuple):
    threshold_alpha = 0.5
    threshold_beta = 0.5

    #choose the strategy to use based on the parameters inside the genome
    if threshold_alpha <= genome[0] and threshold_beta <= genome[1]:
        ply = dumb_PCI_max_longest(state)
    if threshold_alpha <= genome[0] and threshold_beta >= genome[1]:
        ply = dump_PCI_min_longest(state)
    if threshold_alpha >= genome[0] and threshold_beta <= genome[1]:
        ply = dump_PCI_max_lowest(state)
    if threshold_alpha >= genome[0] and threshold_beta >= genome[1]:
        ply = dumb_PCI_min_lowest(state)

    return ply

For example, if genome[1] is equal to 0.5 you will enter both of your first two ifs and you will overwrite the ply when you enter the second one. The same thing happens with the last two ifs when genome[0] is equal to 0.5. I don't think this is the behaviour you want. To address this you could return the ply directly inside the ifs when you assign it, you could remove the = from some conditions, or you could handle it by adding the elif/else.

Another less important detail: when you write your functions and you want to explain what they do in a comment, it's better to write the comment inside the function. Something like:

def nim_sum(state: Nim):
    '''optimal strategy'''
    *_, result = accumulate(state.rows, xor)
    return result

This way, when you call the function later in your code, the editor should show you what you wrote in the comment to help you remember what the function does. This doesn't happen if you write the comment on the outside.

Task 3.3 - An agent using minmax

Your minmax agent is very simple, well-written, and compact.
Unfortunately using a depth bound worsens the performances, especially for a larger N. I understand the need to use a bound like that to make the computation time shorter and I used it too but only for N greater than 8.
If you add some kind of caching system to save the states you have already visited, it will make you save a lot of time! Especially if you manage to detect and exploit equivalent states (I'm still working on this myself). You can even save the cache in a file using the pickle library (or something equivalent) and load it when you want to play. This way you only have to create the cache once and the other games will be much faster.

Task 3.4 - An agent using reinforcement learning

The reinforcement learning algorithm is very clear and well presented. I like the function to assign the rewards even if it's different from what we have seen in class. I also like the choice to train the RL agent against a player that is not always optimal to avoid overfitting. I still think the code could be somewhat improved:

you only trained against a "semi-optimal" strategy: you could try other strategies and maybe a combination of them
it could be a good idea to perform a parameter tuning on the random_factor and the learning rate
it could be useful to show how well/fast your agent learns using a plot
you could add the train and evaluate functions to replicate your results faster if you need or want to.

Regardless, it clearly works quite well so good job!

FlavioPatti / Computational-Intelligence_2022-23