FlavioPatti / Computational-Intelligence_2022-23

0 stars 0 forks source link

Lab 3 - Review by Luigi Federico #7

Open LuigiFederico opened 1 year ago

LuigiFederico commented 1 year ago

Task 1 - Expert player using nim-sum strategy

Here you have not implemented what was asked, but instead you implemented an hard coded rule-based agent.

The thing is that this agent will always lose against an expert agent that uses the nim-sum strategy becouse your code just looks for a winning move only if you are in an final state that your rules can handle.

Furthermore, your agent is very computational expencive since it's a mix between a recursive function, that goes on until it finds a good move, and 10 entire iterations over the state board.

The nim-sum strategy only does 2 iterations over the state board and will always force the opponent in a state where he can't win; moreover it is easy to implement and to read.

Task 2 - Evolved agent

You kinda implemented the optimal strategy inside task3.2.ipynb but trying all the possible states without using the real strategy to look for the only row that can give you nim-sum = 0. Again, following the algorithm that you can find online, you can have a more performant agent.

Your evolved agent uses just 4 hard-coded rules that are very generic. Without other actions, your agent will always perform poorly. You could have used some of the hard-coded rules that you implemented in task3.1. Maybe you could have added a gamma variable in order to use 9 hard-coded rules. This could be beneficial for your agent.

Maybe it was intetional, but the ifs inside the function that chooses which ply the agent will perform are not mutually exclusive according to alpha and beta.

def evolvable(state: Nim, genome: tuple):
    threshold_alpha = 0.5
    threshold_beta = 0.5

    #choose the strategy to use based on the parameters inside the genome
    if threshold_alpha <= genome[0] and threshold_beta <= genome[1]:
        ply = dumb_PCI_max_longest(state)
    if threshold_alpha <= genome[0] and threshold_beta >= genome[1]:
        ply = dump_PCI_min_longest(state)
    if threshold_alpha >= genome[0] and threshold_beta <= genome[1]:
        ply = dump_PCI_max_lowest(state)
    if threshold_alpha >= genome[0] and threshold_beta >= genome[1]:
        ply = dumb_PCI_min_lowest(state)

    return ply

If you have alpha and/or beta equal to 0.5, than the first rules will always be replaced by the ones below. If it was not willful, be aware that this could change the expected behaviour of you agent.

Task 3 - MinMax agent

You adopt a 'deep pruning' combined with alpha-beta pruning in order to cut off the search and reduce the computational complexity.

An approach that you did not considered was to not consider superimposable states. Here is an example:

[1, 0, 3, 1, 2] The above state will have the same outcome of the following states:

  • [1, 0, 3, 2, 1]
  • [3, 1, 2, 0 , 1]
  • [2, 1, 3, 0, 1]
  • etc.. with every possible combination of those rows

All the states can be represented by a single state that has a sorted number of objects: [3, 2, 1, 1, 0].

So, the idea is to map the actual state with his 'ordered version' in and take the result inside a dictionary. Doing so, you can avoid using the deep pruning and you can expand the tree to actually verify if the agent wins or not. With this strategy I runend MinMax on Nim with 8 rows and it took only 10 min (without deep pruning).

Task 4 - LR agent

Nothing to say here, it looks a good implementation. The only think that i could suggest is to train the RL agent with more than one opponent in a gradual way. First against an easy-to-beat agent and than gradually towards the expert. It should make more robust the learning of the agent.