Aleedm / computational-intelligence

MIT License
0 stars 0 forks source link

Lab10 Peer Review #7

Open lorenzobn opened 6 months ago

lorenzobn commented 6 months ago

Hello Alessandro,

I always love your reports with all those visualizations, they are really nice! It is interesting to see that, if Q-learning agent starts first, it always wins against a Montecarlo agent while in the opposite scenario, the match almost always ends up in a draw. I didn't understand the idea behind this calculation n_best_actions. I would have appreciated a comment above it 😃. I just have a question: how did you choose the hyperparameters? Did you find that the hyperparameters you submitted are the best ones? Anyway, good job!

Aleedm commented 6 months ago

Hi Lorenzo,

Thank you for your positive feedback! It is indeed fascinating to observe the results obtained by comparing the Montecarlo method with Q-learning.

I didn't understand the idea behind this calculation n_best_actions. I would have appreciated a comment above it 😃

Initially, n_best_actions was implemented to facilitate this comparison. If both methods always chose the best move, I would end up with x identical games. Therefore, I decided to adopt a different policy in selecting the best move. The top x moves are chosen, where x=len(available_actions) // selection_factor + (len(available_actions) % selection_factor > 0). To clarify, selection_factor determines the number of top moves to consider: if the result of this formula is 0, I add 1 to ensure at least one move is selected. This approach allows for generating diverse games, making Montecarlo and Q-Learning compete in various scenarios.

My code employs the epsilon-greedy policy, which comes into play halfway through the learning process. I then integrated n_best_actions into the epsilon-greedy policy to explore different optimal solutions and avoid local optima.

how did you choose the hyperparameters? Did you find that the hyperparameters you submitted are the best ones?

The hyperparameters were chosen using an approach similar to grid search. The report includes the hyperparameters that yielded the best results with this method. (Grid search is a technique for hyperparameter selection that systematically evaluates different combinations of hyperparameters to find the optimal configuration.)

lorenzobn commented 6 months ago

Thank you Alessandro!