Add ex8.8 code and plot

My solution has similar shape with the book, but different start state value under the greedy policy. I am not sure where goes wrong, probably in the reward calculation? But my results are similar to all the other people's results which I found online (see below reference implementation). So just take my solution as one of the references, don't treat it absolutely correct.

Other reference implementation: