plot the training process of the agent

after implementing the Q Learning model and are able to run the agent, we should be able to train it and document its process over time. Hopefully with a plot.

We should still decide how many times we want to train it. The plot isn't that large, so it shouldn't require a lot of training. Maybe, if we have the time for it, we can make a way larger maze (100 x 100) and see how well we can train it.

a-maa / RL15

plot the training process of the agent #5