Review of Lab_10 by Hossein Khodadadi (313884) and Abolfazl Javidian (314441)

Review added by Hossein Khodadadi (313884) and Abolfazl Javidian (314441).

The writer has used the min-max strategy to update the q_values recursively. A back-propagation function is defined to optimize the action value dictionary based on the gamma and learning rate. it's commonly used in games where two agents are in direct opposition to each other, each trying to maximize their own rewards while minimizing the opponent's rewards, which is implemented in the min-max function. The method has converged to 81.17 percent of win by 20,000 iterations, which is computationally efficient. However, there are some vague ideas in these functions:

1. In the min-max function, It seems that all the values stored in the scores list are always -1 or 1, so how it dynamically help to find the best action and the corresponding score.

2. In the q_learning function, the next_state variable is defined but apparently, it is never used.

If the procedure of learning the q_values is not too much time-consuming, a greed search on different hyper-parameters of alpha and gamma could be performed to achieve higher win rates.

FedeBucce / Computational_intelligence

Review of Lab_10 by Hossein Khodadadi (313884) and Abolfazl Javidian (314441) #10