Open d3ac opened 1 year ago
Thanks for your code. I have a little suggestion : move "agent.update(one_ep_transition)" out of the loop, then it will be at least 60 times faster than before. In practice, I think there is no need to update agent in the loop since it will bring high time complexity ($O(n^2)$). As I tried, I got a obvious better convergence value and faster speed. I wonder if it is feasible. I would appreciate it if you could solve my problem.
There is a spelling mistake in the code "MonteCarlo.ipynb" (class "FisrtVisitMC" -> "FirstVisitMC").