tensorflow.python.framework.errors_impl.InvalidArgumentError: Incompatible shapes: [1,35,63,36] vs. [300,116]

liningxiao commented 6 years ago

2017-12-04 15:09:43.989292: W tensorflow/core/framework/op_kernel.cc:1192] Invalid argument: Incompatible shapes: [1,35,63,36] vs. [300,116] [[Node: gradients/sub_1_grad/BroadcastGradientArgs = BroadcastGradientArgs[T=DT_INT32, _device="/job:localhost/replica:0/task:0/gpu:0"](gradients/sub_1_grad/Shape, gradients/sub_1_grad/Shape_1/_1069)]] 2017-12-04 15:09:43.997870: W tensorflow/core/framework/op_kernel.cc:1192] Invalid argument: Incompatible shapes: [1,35,63,36] vs. [300,116] [[Node: gradients/sub_1_grad/BroadcastGradientArgs = BroadcastGradientArgs[T=DT_INT32, _device="/job:localhost/replica:0/task:0/gpu:0"](gradients/sub_1_grad/Shape, gradients/sub_1_grad/Shape_1/_1069)]] Traceback (most recent call last): File "./faster_rcnn/train_net.py", line 109, in restore=bool(float(args.restore))) File "./faster_rcnn/../lib/fast_rcnn/train.py", line 400, in train_net sw.train_model(sess, max_iters, restore=restore) File "./faster_rcnn/../lib/fast_rcnn/train.py", line 255, in train_model cls_prob, bbox_pred, rois = sess.run(fetches=fetch_list, feed_dict=feed_dict) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 895, in run run_metadata_ptr) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1124, in _run feed_dict_tensor, options, run_metadata) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1321, in _do_run options, run_metadata) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1340, in _do_call raise type(e)(node_def, op, message) tensorflow.python.framework.errors_impl.InvalidArgumentError: Incompatible shapes: [1,35,63,36] vs. [300,116] [[Node: gradients/sub_1_grad/BroadcastGradientArgs = BroadcastGradientArgs[T=DT_INT32, _device="/job:localhost/replica:0/task:0/gpu:0"](gradients/sub_1_grad/Shape, gradients/sub_1_grad/Shape_1/_1069)]] [[Node: Mean_2/_1101 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/cpu:0", send_device="/job:localhost/replica:0/task:0/gpu:0", send_device_incarnation=1, tensor_name="edge_2700_Mean_2", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"]()]]

Caused by op u'gradients/sub_1_grad/BroadcastGradientArgs', defined at: File "./faster_rcnn/train_net.py", line 109, in restore=bool(float(args.restore))) File "./faster_rcnn/../lib/fast_rcnn/train.py", line 400, in train_net sw.train_model(sess, max_iters, restore=restore) File "./faster_rcnn/../lib/fast_rcnn/train.py", line 142, in train_model grads, norm = tf.clip_by_global_norm(tf.gradients(loss, tvars), 10.0) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/gradients_impl.py", line 542, in gradients grad_scope, op, func_call, lambda: grad_fn(op, out_grads)) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/gradients_impl.py", line 348, in _MaybeCompile return grad_fn() # Exit early File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/gradients_impl.py", line 542, in grad_scope, op, func_call, lambda: grad_fn(op, out_grads)) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/math_grad.py", line 700, in _SubGrad rx, ry = gen_array_ops._broadcast_gradient_args(sx, sy) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/gen_array_ops.py", line 395, in _broadcast_gradient_args name=name) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/op_def_library.py", line 767, in apply_op op_def=op_def) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 2628, in create_op original_op=self._default_original_op, op_def=op_def) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 1204, in init self._traceback = self._graph._extract_stack() # pylint: disable=protected-access

...which was originally created as op u'sub_1', defined at: File "./faster_rcnn/train_net.py", line 109, in restore=bool(float(args.restore))) [elided 0 identical lines from previous traceback] File "./faster_rcnn/../lib/fast_rcnn/train.py", line 400, in train_net sw.train_model(sess, max_iters, restore=restore) File "./faster_rcnn/../lib/fast_rcnn/train.py", line 112, in train_model self.net.build_loss(ohem=cfg.TRAIN.OHEM) File "./faster_rcnn/../lib/networks/network.py", line 667, in build_loss bbox_outside_weights self.smooth_l1_dist(bbox_inside_weights (bbox_pred - bbox_targets)), \ File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/math_ops.py", line 865, in binary_op_wrapper return func(x, y, name=name) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/gen_math_ops.py", line 2631, in _sub result = _op_def_lib.apply_op("Sub", x=x, y=y, name=name) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/op_def_library.py", line 767, in apply_op op_def=op_def) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 2628, in create_op original_op=self._default_original_op, op_def=op_def) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 1204, in init self._traceback = self._graph._extract_stack() # pylint: disable=protected-access

InvalidArgumentError (see above for traceback): Incompatible shapes: [1,35,63,36] vs. [300,116] [[Node: gradients/sub_1_grad/BroadcastGradientArgs = BroadcastGradientArgs[T=DT_INT32, _device="/job:localhost/replica:0/task:0/gpu:0"](gradients/sub_1_grad/Shape, gradients/sub_1_grad/Shape_1/_1069)]] [[Node: Mean_2/_1101 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/cpu:0", send_device="/job:localhost/replica:0/task:0/gpu:0", send_device_incarnation=1, tensor_name="edge_2700_Mean_2", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"]()]]

please help me to solve the problem! thanks!

endernewton commented 6 years ago

300 should be the number of rois proposed.

liningxiao commented 6 years ago

@endernewton thanks,but i have the new problem 2017-12-08 13:39:25.144239: E tensorflow/stream_executor/cuda/cuda_blas.cc:366] failed to create cublas handle: CUBLAS_STATUS_NOT_INITIALIZED 2017-12-08 13:39:25.144289: W tensorflow/stream_executor/stream.cc:1756] attempting to perform BLAS operation using StreamExecutor without BLAS support Traceback (most recent call last): tensorflow.python.framework.errors_impl.InternalError: Blas SGEMM launch failed : m=34860, n=64, k=64 [[Node: res2a_branch2a/Conv2D = Conv2D[T=DT_FLOAT, data_format="NHWC", padding="SAME", strides=[1, 1, 1, 1], use_cudnn_on_gpu=true, _device="/job:localhost/replica:0/task:0/gpu:0"](pool1, res2a_branch2a/weights)]] [[Node: rpn_rois/_1061 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/cpu:0", send_device="/job:localhost/replica:0/task:0/gpu:0", send_device_incarnation=1, tensor_name="edge_2345_rpn_rois", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"]()]]

Yc174 commented 6 years ago

Did OHEM work? I repeated it, but it didn't work. @liningxiao

endernewton commented 6 years ago

i think different implementation can have different results. haven't tried myself though.

liningxiao commented 6 years ago

YES,it works , I have solved this question .@Yc174 my code when bounding box regression L1 loss have something problems of shapes，i changed it and it works.@endernewton

liningxiao commented 6 years ago

@endernewton Do you try resnext101 training in your code? I want to know the precision of your training. thanks.

endernewton / tf-faster-rcnn

tensorflow.python.framework.errors_impl.InvalidArgumentError: Incompatible shapes: [1,35,63,36] vs. [300,116] #255