Closed mehrdad78 closed 9 months ago
Hello, @mehrdad78.
As the number of steps
increases, we can observe an increase in the average return
value.
Furthermore, starting from around 10,000 steps, the return
value consistently stabilizes around 1.
This indicates that the model has been successfully trained in the right direction.
Hi there! My question is about the plot you shown in frozen-lake example. What does it show to us? How can we understand that the problem is solved?