Closed zynk13 closed 6 years ago
Hi, I don't recall seeing such error when implementing DDPG. I am not familiar with BeerGame, although I would assume the error comes from the [states, grads] you feed in the optimizer. Also, have you checked that all the dimensions in optimizer() of actor.py are correct ?
Thanks for the quick response! I thought along the same lines and I did check all shapes inside the optimizer function and its inputs. Everything seems to be fine :
action_gdts.shape : (?, 1) params_grad.shape :(8,) trainable_weights.shape : (8,) grads : <zip object at 0x1177d9d48> shape : () state shape : (64, 6) action shape : (64, 1) grads shape : (64, 1)
I just tried the 'MountainCarContinuous-v0' environment with no modifications to your code and it throws the same error. Found this env as an example in your continuous environment wrapper so I'm assuming you tried it?
Thank you for the feedback, I just tried running DDPG with MountainCarContinuous-v0 and it runs fine on my computer. Are you using Keras 2.1.6 ?
Fixed it. I am using Keras 2.2.2 and there was a minor change in the way K.function works after ~2.1.6.
In the optimizer function in actor.py, I had to change
return K.function([self.model.input, action_gdts], [tf.train.AdamOptimizer(self.lr).apply_gradients(grads)])
to
return K.function([self.model.input, action_gdts], [tf.train.AdamOptimizer(self.lr).apply_gradients(grads)][1:])
Including the input layer in the output of K.function threw the error. Just removed the input layer from the output placeholder and it works like a charm. Thanks a lot for helping out!
Great! Glad to hear that was the error
So this is not actually correct either - In Keras 2.2.2 and above K.function is a bit different:
K.function(inputs, outputs, updates)
This is the correct function:
K.function(inputs=[self._state, self._action_grads], outputs=[], updates=[tf.train.AdamOptimizer(self._learning_rate).apply_gradients(grads)])
I was initially caught out by this - I couldn't work out why the model wasn't updating - this is why.
@sverzijl:
K.function(inputs=[self._state, self._action_grads], outputs=[], updates=[tf.train.AdamOptimizer(self._learning_rate).apply_gradients(grads)])
I am having similar issues. What is self._state
? I didn't find it in actor.py.
In the current documentation for K.function
it says it should be placeholder tensors for input
I got it, it was missing the slice at the end of gradients. this is what worked for me:
K.function(inputs=[self.model.input, action_gdts], outputs=[],
updates=[tf.train.AdamOptimizer(self.learning_rate).apply_gradients(grads)][1:])
Disclaimer: I am adapting the code for a different problem.
Funnily enough, all of these solutions didn't work for me. The problem for me was that outputs could not be an empty list [] (all original code), so I ended up adding a random variable and everything works.
K.function(inputs=[self.model.input, action_gdts], outputs=[ K.constant(1)],updates=[tf.train.AdamOptimizer(self.lr).apply_gradients(grads)])
Hello everyone, I got a similar error, but none of those mentioned above solutions work for me. Does anybody have a recommendation?
Hey, great work on the implementations! I tried using your DDPG implementation on another environment (BeerGame) and am getting the following error :
I only made a few modifications to your code to suit my environment like state and action dimensions. This error occurs during training in Adam optimizer, when the action gradients from the critic are being propagated to the actor network. Was wondering if you encountered any similar errors during your implementation.