germain-hug / Deep-RL-Keras

Keras Implementation of popular Deep RL Algorithms (A3C, DDQN, DDPG, Dueling DDQN)
533 stars 149 forks source link

DDPG #2

Closed zynk13 closed 6 years ago

zynk13 commented 6 years ago

Hey, great work on the implementations! I tried using your DDPG implementation on another environment (BeerGame) and am getting the following error :

Traceback (most recent call last): File "ddpg.py", line 191, in main() File "ddpg.py", line 183, in main stats = distributor.train(env, args, summary_writer) File "ddpg.py", line 130, in train self.update_models(states, actions, critic_target) File "ddpg.py", line 73, in update_models self.actor.train(states, actions, np.array(grads).reshape((-1, self.act_dim))) File "/Users/aravind/Desktop/DDPG/ddpg_actor.py", line 72, in train self.adam_optimizer([states, grads]) File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/keras/backend/tensorflow_backend.py", line 2666, in call return self._call(inputs) File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/keras/backend/tensorflow_backend.py", line 2635, in _call session) File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/keras/backend/tensorflow_backend.py", line 2587, in _make_callable callable_fn = session._make_callable_from_options(callable_opts) File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1414, in _make_callable_from_options return BaseSession._Callable(self, callable_options) File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1368, in init session._session, options_ptr, status) File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/tensorflow/python/framework/errors_impl.py", line 519, in exit c_api.TFGetCode(self.status.status)) tensorflow.python.framework.errors_impl.OutOfRangeError: Node 'Adam' (type: 'NoOp', num of outputs: 0) does not have output 0_ Exception ignored in: <bound method BaseSession._Callable.del of <tensorflow.python.client.session.BaseSession._Callable object at 0x119e7f630>> Traceback (most recent call last): File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1398, in del self._session._session, self._handle, status) File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/tensorflow/python/framework/errors_impl.py", line 519, in exit c_api.TF_GetCode(self.status.status)) tensorflow.python.framework.errors_impl.InvalidArgumentError: No such callable handle: 140324910323680

I only made a few modifications to your code to suit my environment like state and action dimensions. This error occurs during training in Adam optimizer, when the action gradients from the critic are being propagated to the actor network. Was wondering if you encountered any similar errors during your implementation.

germain-hug commented 6 years ago

Hi, I don't recall seeing such error when implementing DDPG. I am not familiar with BeerGame, although I would assume the error comes from the [states, grads] you feed in the optimizer. Also, have you checked that all the dimensions in optimizer() of actor.py are correct ?

zynk13 commented 6 years ago

Thanks for the quick response! I thought along the same lines and I did check all shapes inside the optimizer function and its inputs. Everything seems to be fine :

action_gdts.shape : (?, 1) params_grad.shape :(8,) trainable_weights.shape : (8,) grads : <zip object at 0x1177d9d48> shape : () state shape : (64, 6) action shape : (64, 1) grads shape : (64, 1)

zynk13 commented 6 years ago

I just tried the 'MountainCarContinuous-v0' environment with no modifications to your code and it throws the same error. Found this env as an example in your continuous environment wrapper so I'm assuming you tried it?

germain-hug commented 6 years ago

Thank you for the feedback, I just tried running DDPG with MountainCarContinuous-v0 and it runs fine on my computer. Are you using Keras 2.1.6 ?

zynk13 commented 6 years ago

Fixed it. I am using Keras 2.2.2 and there was a minor change in the way K.function works after ~2.1.6.

In the optimizer function in actor.py, I had to change return K.function([self.model.input, action_gdts], [tf.train.AdamOptimizer(self.lr).apply_gradients(grads)]) to return K.function([self.model.input, action_gdts], [tf.train.AdamOptimizer(self.lr).apply_gradients(grads)][1:])

Including the input layer in the output of K.function threw the error. Just removed the input layer from the output placeholder and it works like a charm. Thanks a lot for helping out!

germain-hug commented 6 years ago

Great! Glad to hear that was the error

sverzijl commented 5 years ago

So this is not actually correct either - In Keras 2.2.2 and above K.function is a bit different:

K.function(inputs, outputs, updates)

This is the correct function: K.function(inputs=[self._state, self._action_grads], outputs=[], updates=[tf.train.AdamOptimizer(self._learning_rate).apply_gradients(grads)])

I was initially caught out by this - I couldn't work out why the model wasn't updating - this is why.

fccoelho commented 5 years ago

@sverzijl:

K.function(inputs=[self._state, self._action_grads], outputs=[], updates=[tf.train.AdamOptimizer(self._learning_rate).apply_gradients(grads)])

I am having similar issues. What is self._state? I didn't find it in actor.py. In the current documentation for K.function it says it should be placeholder tensors for input

fccoelho commented 5 years ago

I got it, it was missing the slice at the end of gradients. this is what worked for me:

K.function(inputs=[self.model.input, action_gdts], outputs=[],
                          updates=[tf.train.AdamOptimizer(self.learning_rate).apply_gradients(grads)][1:])

Disclaimer: I am adapting the code for a different problem.

ArvydasKubilius commented 4 years ago

Funnily enough, all of these solutions didn't work for me. The problem for me was that outputs could not be an empty list [] (all original code), so I ended up adding a random variable and everything works.

K.function(inputs=[self.model.input, action_gdts], outputs=[ K.constant(1)],updates=[tf.train.AdamOptimizer(self.lr).apply_gradients(grads)])
Nazanin-87 commented 2 years ago

Hello everyone, I got a similar error, but none of those mentioned above solutions work for me. Does anybody have a recommendation?