Chapter14 DDPG code about policy gradient

~I honestly feel like the code from the book is wrong. Like this does train but I think it is incorrect, and slower than it should.~

~I say this because the above loss will go up to 10,20,30+, which isnt great. The main reason why this is the case is because the loss from the book is the literal q value, not the gradient / derivative. I dont think you want to be working with literal q values as losses since they can be much larger and smaller than 1.~

~If you do a gradient, it will likely stay a reasonable number regardless of whether q is 1 or 1000.~

~Once again, this code does work, buuuut I think it would train faster and more stably usign the gradients instead.~

Edit: Thinking about this more, the reason this doesnt match the equation is because you're using pytorch's autograd to do the gradients for you. Its not a particularly satisfying answer, and I find the idea of the actor_loss_v steadily growing larger concerning, and that it might be slowing training.

PacktPublishing / Deep-Reinforcement-Learning-Hands-On

Chapter14 DDPG code about policy gradient #59