I divided my review in 3 sections: Code, Model and Others.
Code:
The code is clear and well commented. I personally would have handled differently the different players in order to make the training more clean, for example by creating a class for each player.
I would have avoided to print all the possible outcomes and the entire Q-table, both for readability and to speed up the code execution.
Model:
The solution you adopted is (Model Free Q-Learning) is correct. Furthermore, you correctly updated the Q-table only considering the moves taken by the agent (thus, considering the moves taken by the random player as part of the enviroment).
The representation you adopted for the Q-table is very clear.
Others:
The usage of graphs is very helpful to understand the results.
I believe than increasing the number of episodes would have led to even better results.
You should have let the agent play second too, not only as first.
I divided my review in 3 sections: Code, Model and Others.