Hello there,
I start saying that README file is very well done, complete and clear on all aspects of your code.
For what concerns the cleanliness of the code, it's remarkable that different algorithms are put on different files, but i still find the main file a little too much crowded, maybe you could add more comments or move some functions on a differnet utilities file. And i found strange that the hardcoded algorithm was put on the main.
1) Speaking of that, i liked the fact that you utilized some cooked metrics shown by the professor to perform the choice of the move, and the hardcoded rules are simple but effective, even if they are useful only in the last steps of the game.
Also very nice that the function of evaluation is common for all the methods, and for me the winrate is the best choice for evaluating (even if with EA method is a bit different, you could have implemented the reverse mode, i.e. choosing if playing as first or second, also on the others).
2) The EA method you used is clever, because in the genome you have probabilities to do a certain tipe of move, based if the number of active rows are even or odd. And this evolution converge on the same results you have obtained with the first method. But maybe I would have gone further, and to have the possibility to choose more than 2 types of moves (moreover the same of the first agent) and maybe considering other factors than only even/odd active rows. It's still a well programmed algorithm, maybe a bit more randomness on the crossover would have been better
if probs[0] > genome[0]:
if self.k is not None:
n_objects = min(self.k, data["rows_2_or_more"][0][1]) # take all
else:
n_objects = data["rows_2_or_more"][0][1] # take all
else:
if self.k is not None:
n_objects = min(self.k, data["rows_2_or_more"][0][1] - 1) # leave one
else:
n_objects = data["rows_2_or_more"][0][1] - 1
3) The nimsum with alpha-beta pruning unfortunately is not effective, maybe considering two different loops for the two different players broke something (unlike the base code given by the professor). Very complicated code, but it's very remarkable the usage of a cache to not re-enter on the previously calculated states, boosting the performance. the alpha-beta pruning part doesn't work, but it's nice that to have also put a depth measure to in case interrupt the ricorsion. Also nice the way you use to understand if it's your turn or not in the recursive function
4) In the RL algorithm the training done having as contenders both RL agents is strange... for example for training the RL agent I used the optimal agent (and with a lower probability a random agent). But seeing your results, somehow you managed to learn things, so great, even if performances are not the best (a bit lower than mine, even if we used the same method to decide the rewards)
Hello there, I start saying that README file is very well done, complete and clear on all aspects of your code. For what concerns the cleanliness of the code, it's remarkable that different algorithms are put on different files, but i still find the main file a little too much crowded, maybe you could add more comments or move some functions on a differnet utilities file. And i found strange that the hardcoded algorithm was put on the main. 1) Speaking of that, i liked the fact that you utilized some cooked metrics shown by the professor to perform the choice of the move, and the hardcoded rules are simple but effective, even if they are useful only in the last steps of the game. Also very nice that the function of evaluation is common for all the methods, and for me the winrate is the best choice for evaluating (even if with EA method is a bit different, you could have implemented the reverse mode, i.e. choosing if playing as first or second, also on the others). 2) The EA method you used is clever, because in the genome you have probabilities to do a certain tipe of move, based if the number of active rows are even or odd. And this evolution converge on the same results you have obtained with the first method. But maybe I would have gone further, and to have the possibility to choose more than 2 types of moves (moreover the same of the first agent) and maybe considering other factors than only even/odd active rows. It's still a well programmed algorithm, maybe a bit more randomness on the crossover would have been better
3) The nimsum with alpha-beta pruning unfortunately is not effective, maybe considering two different loops for the two different players broke something (unlike the base code given by the professor). Very complicated code, but it's very remarkable the usage of a cache to not re-enter on the previously calculated states, boosting the performance. the alpha-beta pruning part doesn't work, but it's nice that to have also put a depth measure to in case interrupt the ricorsion. Also nice the way you use to understand if it's your turn or not in the recursive function 4) In the RL algorithm the training done having as contenders both RL agents is strange... for example for training the RL agent I used the optimal agent (and with a lower probability a random agent). But seeing your results, somehow you managed to learn things, so great, even if performances are not the best (a bit lower than mine, even if we used the same method to decide the rewards)
Thanks for the attention!