Closed xiaoyongzhu closed 6 years ago
Looks like this repo does not support the latest multi-GPU model which is introduced in Keras 2.0.9. When I do this:
if(num_gpu > 1): model = multi_gpu_model(model, gpus=num_gpu) # compile the model model.compile(optimizer=optimizers.Adam(lr=args.lr), loss=[margin_loss, 'mse'], loss_weights=[1., args.lam_recon], metrics={'out_caps': 'accuracy'})
It will give me this error, so looks like the input layer does not handle the data well (not sure about this though).
2017-11-10 23:15:25.160851: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1120] Creating TensorFlow device (/device:GPU:0) -> (device: 0, name: Tesla K80, pci bus id: 451d:00:00.0, compute capability: 3.7) 2017-11-10 23:15:25.160892: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1120] Creating TensorFlow device (/device:GPU:1) -> (device: 1, name: Tesla K80, pci bus id: 7dcb:00:00.0, compute capability: 3.7) Train on 60000 samples, validate on 10000 samples 2017-11-10 23:15:27.118862: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1120] Creating TensorFlow device (/device:GPU:0) -> (device: 0, name: Tesla K80, pci bus id: 451d:00:00.0, compute capability: 3.7) 2017-11-10 23:15:27.118901: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1120] Creating TensorFlow device (/device:GPU:1) -> (device: 1, name: Tesla K80, pci bus id: 7dcb:00:00.0, compute capability: 3.7) Epoch 1/30 2017-11-10 23:15:31.162715: W tensorflow/core/framework/op_kernel.cc:1192] Invalid argument: Incompatible shapes: [100,1152,10,1,1] vs. [50,1152,10,1,16] [[Node: replica_0/model_1/digitcaps/mul = Mul[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:GPU:0"](replica_0/model_1/digitcaps/transpose_1, replica_0/model_1/digitcaps/scan/TensorArrayStack/TensorArrayGatherV3)]] 2017-11-10 23:15:31.162970: W tensorflow/core/framework/op_kernel.cc:1192] Invalid argument: Incompatible shapes: [100,1152,10,1,1] vs. [50,1152,10,1,16] [[Node: replica_0/model_1/digitcaps/mul = Mul[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:GPU:0"](replica_0/model_1/digitcaps/transpose_1, replica_0/model_1/digitcaps/scan/TensorArrayStack/TensorArrayGatherV3)]] 2017-11-10 23:15:31.167090: W tensorflow/core/framework/op_kernel.cc:1192] Invalid argument: Incompatible shapes: [100,1152,10,1,1] vs. [50,1152,10,1,16] [[Node: replica_0/model_1/digitcaps/mul = Mul[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:GPU:0"](replica_0/model_1/digitcaps/transpose_1, replica_0/model_1/digitcaps/scan/TensorArrayStack/TensorArrayGatherV3)]] 2017-11-10 23:15:31.170465: W tensorflow/core/framework/op_kernel.cc:1192] Invalid argument: Incompatible shapes: [100,1152,10,1,1] vs. [50,1152,10,1,16] [[Node: replica_0/model_1/digitcaps/mul = Mul[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:GPU:0"](replica_0/model_1/digitcaps/transpose_1, replica_0/model_1/digitcaps/scan/TensorArrayStack/TensorArrayGatherV3)]] 2017-11-10 23:15:31.170701: W tensorflow/core/framework/op_kernel.cc:1192] Invalid argument: Incompatible shapes: [100,1152,10,1,1] vs. [50,1152,10,1,16] [[Node: replica_0/model_1/digitcaps/mul = Mul[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:GPU:0"](replica_0/model_1/digitcaps/transpose_1, replica_0/model_1/digitcaps/scan/TensorArrayStack/TensorArrayGatherV3)]] 2017-11-10 23:15:31.175048: W tensorflow/core/framework/op_kernel.cc:1192] Invalid argument: Incompatible shapes: [100,1152,10,1,1] vs. [50,1152,10,1,16] [[Node: replica_0/model_1/digitcaps/mul = Mul[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:GPU:0"](replica_0/model_1/digitcaps/transpose_1, replica_0/model_1/digitcaps/scan/TensorArrayStack/TensorArrayGatherV3)]] Traceback (most recent call last): File "/datadrive/xiaoyzhu/python3env/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1323, in _do_call return fn(*args) File "/datadrive/xiaoyzhu/python3env/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1302, in _run_fn status, run_metadata) File "/datadrive/xiaoyzhu/python3env/lib/python3.5/site-packages/tensorflow/python/framework/errors_impl.py", line 473, in __exit__ c_api.TF_GetCode(self.status.status)) tensorflow.python.framework.errors_impl.InvalidArgumentError: Incompatible shapes: [100,1152,10,1,1] vs. [50,1152,10,1,16] [[Node: replica_0/model_1/digitcaps/mul = Mul[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:GPU:0"](replica_0/model_1/digitcaps/transpose_1, replica_0/model_1/digitcaps/scan/TensorArrayStack/TensorArrayGatherV3)]] [[Node: training/Adam/gradients/concatenate_2/concat_grad/Slice_1/_309 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:GPU:1", send_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device_incarnation=1, tensor_name="edge_2229_training/Adam/gradients/concatenate_2/concat_grad/Slice_1", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:GPU:1"]()]] During handling of the above exception, another exception occurred: Traceback (most recent call last): File "capsulenet.py", line 215, in <module> train(model=model, data=((x_train, y_train), (x_test, y_test)), args=args) File "capsulenet.py", line 113, in train validation_data=[[x_test, y_test], [y_test, x_test]], callbacks=[log, tb, checkpoint, lr_decay]) File "/datadrive/xiaoyzhu/python3env/lib/python3.5/site-packages/keras/engine/training.py", line 1631, in fit validation_steps=validation_steps) File "/datadrive/xiaoyzhu/python3env/lib/python3.5/site-packages/keras/engine/training.py", line 1213, in _fit_loop outs = f(ins_batch) File "/datadrive/xiaoyzhu/python3env/lib/python3.5/site-packages/keras/backend/tensorflow_backend.py", line 2332, in __call__ **self.session_kwargs) File "/datadrive/xiaoyzhu/python3env/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 889, in run run_metadata_ptr) File "/datadrive/xiaoyzhu/python3env/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1120, in _run feed_dict_tensor, options, run_metadata) File "/datadrive/xiaoyzhu/python3env/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1317, in _do_run options, run_metadata) File "/datadrive/xiaoyzhu/python3env/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1336, in _do_call raise type(e)(node_def, op, message) tensorflow.python.framework.errors_impl.InvalidArgumentError: Incompatible shapes: [100,1152,10,1,1] vs. [50,1152,10,1,16] [[Node: replica_0/model_1/digitcaps/mul = Mul[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:GPU:0"](replica_0/model_1/digitcaps/transpose_1, replica_0/model_1/digitcaps/scan/TensorArrayStack/TensorArrayGatherV3)]] [[Node: training/Adam/gradients/concatenate_2/concat_grad/Slice_1/_309 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:GPU:1", send_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device_incarnation=1, tensor_name="edge_2229_training/Adam/gradients/concatenate_2/concat_grad/Slice_1", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:GPU:1"]()]] Caused by op 'replica_0/model_1/digitcaps/mul', defined at: File "capsulenet.py", line 215, in <module> train(model=model, data=((x_train, y_train), (x_test, y_test)), args=args) File "capsulenet.py", line 103, in train model = multi_gpu_model(model, gpus=num_gpu) File "/datadrive/xiaoyzhu/python3env/lib/python3.5/site-packages/keras/utils/training_utils.py", line 143, in multi_gpu_model outputs = model(inputs) File "/datadrive/xiaoyzhu/python3env/lib/python3.5/site-packages/keras/engine/topology.py", line 603, in __call__ output = self.call(inputs, **kwargs) File "/datadrive/xiaoyzhu/python3env/lib/python3.5/site-packages/keras/engine/topology.py", line 2061, in call output_tensors, _, _ = self.run_internal_graph(inputs, masks) File "/datadrive/xiaoyzhu/python3env/lib/python3.5/site-packages/keras/engine/topology.py", line 2212, in run_internal_graph output_tensors = _to_list(layer.call(computed_tensor, **kwargs)) File "/datadrive/xiaoyzhu/RandomExercise/CapsNet-Keras/capsulelayers.py", line 157, in call outputs = squash(K.sum(c * inputs_hat, 1, keepdims=True)) File "/datadrive/xiaoyzhu/python3env/lib/python3.5/site-packages/tensorflow/python/ops/math_ops.py", line 894, in binary_op_wrapper return func(x, y, name=name) File "/datadrive/xiaoyzhu/python3env/lib/python3.5/site-packages/tensorflow/python/ops/math_ops.py", line 1117, in _mul_dispatch return gen_math_ops._mul(x, y, name=name) File "/datadrive/xiaoyzhu/python3env/lib/python3.5/site-packages/tensorflow/python/ops/gen_math_ops.py", line 2726, in _mul "Mul", x=x, y=y, name=name) File "/datadrive/xiaoyzhu/python3env/lib/python3.5/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper op_def=op_def) File "/datadrive/xiaoyzhu/python3env/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 2956, in create_op op_def=op_def) File "/datadrive/xiaoyzhu/python3env/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 1470, in __init__ self._traceback = self._graph._extract_stack() # pylint: disable=protected-access InvalidArgumentError (see above for traceback): Incompatible shapes: [100,1152,10,1,1] vs. [50,1152,10,1,16] [[Node: replica_0/model_1/digitcaps/mul = Mul[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:GPU:0"](replica_0/model_1/digitcaps/transpose_1, replica_0/model_1/digitcaps/scan/TensorArrayStack/TensorArrayGatherV3)]] [[Node: training/Adam/gradients/concatenate_2/concat_grad/Slice_1/_309 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:GPU:1", send_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device_incarnation=1, tensor_name="edge_2229_training/Adam/gradients/concatenate_2/concat_grad/Slice_1", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:GPU:1"]()]] Exception ignored in: <bound method BaseSession.__del__ of <tensorflow.python.client.session.Session object at 0x7f16e115a828>> Traceback (most recent call last): File "/datadrive/xiaoyzhu/python3env/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 696, in __del__ File "/datadrive/xiaoyzhu/python3env/lib/python3.5/site-packages/tensorflow/python/framework/c_api_util.py", line 30, in __init__ TypeError: 'NoneType' object is not callable
@xiaoyongzhu Thanks for your feedback, I’ll test on multi-gpu later. And welcome to PR if you can solve this.
@xiaoyongzhu I have added the multi-gpu support. Thanks for the feedback.
Looks like this repo does not support the latest multi-GPU model which is introduced in Keras 2.0.9. When I do this:
It will give me this error, so looks like the input layer does not handle the data well (not sure about this though).