Closed random-user-x closed 6 years ago
Changes look fine to me, but what were the error messages you were getting for reference?
Hello @Kaixhin , for off-policy learning, you will face errors while getting the gradients. You need to add the mean to get the mean of the batch. Secondly, random.choice seems to return empty list sometimes. This causes an error as there is nothing to learn from the list. I removed it by introducing random.randrange which doesn't have these issues. Let me know if you need more clarification for the PR.
Removing basic errors. @Kaixhin, let me know if these should be added. I faced errors when I was running the previous codes.