Open liningxiao opened 6 years ago
300 should be the number of rois proposed.
@endernewton thanks,but i have the new problem 2017-12-08 13:39:25.144239: E tensorflow/stream_executor/cuda/cuda_blas.cc:366] failed to create cublas handle: CUBLAS_STATUS_NOT_INITIALIZED 2017-12-08 13:39:25.144289: W tensorflow/stream_executor/stream.cc:1756] attempting to perform BLAS operation using StreamExecutor without BLAS support Traceback (most recent call last): tensorflow.python.framework.errors_impl.InternalError: Blas SGEMM launch failed : m=34860, n=64, k=64 [[Node: res2a_branch2a/Conv2D = Conv2D[T=DT_FLOAT, data_format="NHWC", padding="SAME", strides=[1, 1, 1, 1], use_cudnn_on_gpu=true, _device="/job:localhost/replica:0/task:0/gpu:0"](pool1, res2a_branch2a/weights)]] [[Node: rpn_rois/_1061 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/cpu:0", send_device="/job:localhost/replica:0/task:0/gpu:0", send_device_incarnation=1, tensor_name="edge_2345_rpn_rois", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"]()]]
Did OHEM work? I repeated it, but it didn't work. @liningxiao
i think different implementation can have different results. haven't tried myself though.
YES,it works , I have solved this question .@Yc174 my code when bounding box regression L1 loss have something problems of shapes,i changed it and it works.@endernewton
@endernewton Do you try resnext101 training in your code? I want to know the precision of your training. thanks.
2017-12-04 15:09:43.989292: W tensorflow/core/framework/op_kernel.cc:1192] Invalid argument: Incompatible shapes: [1,35,63,36] vs. [300,116] [[Node: gradients/sub_1_grad/BroadcastGradientArgs = BroadcastGradientArgs[T=DT_INT32, _device="/job:localhost/replica:0/task:0/gpu:0"](gradients/sub_1_grad/Shape, gradients/sub_1_grad/Shape_1/_1069)]] 2017-12-04 15:09:43.997870: W tensorflow/core/framework/op_kernel.cc:1192] Invalid argument: Incompatible shapes: [1,35,63,36] vs. [300,116] [[Node: gradients/sub_1_grad/BroadcastGradientArgs = BroadcastGradientArgs[T=DT_INT32, _device="/job:localhost/replica:0/task:0/gpu:0"](gradients/sub_1_grad/Shape, gradients/sub_1_grad/Shape_1/_1069)]] Traceback (most recent call last): File "./faster_rcnn/train_net.py", line 109, in
restore=bool(float(args.restore)))
File "./faster_rcnn/../lib/fast_rcnn/train.py", line 400, in train_net
sw.train_model(sess, max_iters, restore=restore)
File "./faster_rcnn/../lib/fast_rcnn/train.py", line 255, in train_model
cls_prob, bbox_pred, rois = sess.run(fetches=fetch_list, feed_dict=feed_dict)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 895, in run
run_metadata_ptr)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1124, in _run
feed_dict_tensor, options, run_metadata)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1321, in _do_run
options, run_metadata)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1340, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Incompatible shapes: [1,35,63,36] vs. [300,116]
[[Node: gradients/sub_1_grad/BroadcastGradientArgs = BroadcastGradientArgs[T=DT_INT32, _device="/job:localhost/replica:0/task:0/gpu:0"](gradients/sub_1_grad/Shape, gradients/sub_1_grad/Shape_1/_1069)]]
[[Node: Mean_2/_1101 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/cpu:0", send_device="/job:localhost/replica:0/task:0/gpu:0", send_device_incarnation=1, tensor_name="edge_2700_Mean_2", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"]()]]
Caused by op u'gradients/sub_1_grad/BroadcastGradientArgs', defined at: File "./faster_rcnn/train_net.py", line 109, in
restore=bool(float(args.restore)))
File "./faster_rcnn/../lib/fast_rcnn/train.py", line 400, in train_net
sw.train_model(sess, max_iters, restore=restore)
File "./faster_rcnn/../lib/fast_rcnn/train.py", line 142, in train_model
grads, norm = tf.clip_by_global_norm(tf.gradients(loss, tvars), 10.0)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/gradients_impl.py", line 542, in gradients
grad_scope, op, func_call, lambda: grad_fn(op, out_grads))
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/gradients_impl.py", line 348, in _MaybeCompile
return grad_fn() # Exit early
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/gradients_impl.py", line 542, in
grad_scope, op, func_call, lambda: grad_fn(op, out_grads))
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/math_grad.py", line 700, in _SubGrad
rx, ry = gen_array_ops._broadcast_gradient_args(sx, sy)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/gen_array_ops.py", line 395, in _broadcast_gradient_args
name=name)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/op_def_library.py", line 767, in apply_op
op_def=op_def)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 2628, in create_op
original_op=self._default_original_op, op_def=op_def)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 1204, in init
self._traceback = self._graph._extract_stack() # pylint: disable=protected-access
...which was originally created as op u'sub_1', defined at: File "./faster_rcnn/train_net.py", line 109, in
restore=bool(float(args.restore)))
[elided 0 identical lines from previous traceback]
File "./faster_rcnn/../lib/fast_rcnn/train.py", line 400, in train_net
sw.train_model(sess, max_iters, restore=restore)
File "./faster_rcnn/../lib/fast_rcnn/train.py", line 112, in train_model
self.net.build_loss(ohem=cfg.TRAIN.OHEM)
File "./faster_rcnn/../lib/networks/network.py", line 667, in build_loss
bbox_outside_weights self.smooth_l1_dist(bbox_inside_weights (bbox_pred - bbox_targets)), \
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/math_ops.py", line 865, in binary_op_wrapper
return func(x, y, name=name)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/gen_math_ops.py", line 2631, in _sub
result = _op_def_lib.apply_op("Sub", x=x, y=y, name=name)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/op_def_library.py", line 767, in apply_op
op_def=op_def)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 2628, in create_op
original_op=self._default_original_op, op_def=op_def)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 1204, in init
self._traceback = self._graph._extract_stack() # pylint: disable=protected-access
InvalidArgumentError (see above for traceback): Incompatible shapes: [1,35,63,36] vs. [300,116] [[Node: gradients/sub_1_grad/BroadcastGradientArgs = BroadcastGradientArgs[T=DT_INT32, _device="/job:localhost/replica:0/task:0/gpu:0"](gradients/sub_1_grad/Shape, gradients/sub_1_grad/Shape_1/_1069)]] [[Node: Mean_2/_1101 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/cpu:0", send_device="/job:localhost/replica:0/task:0/gpu:0", send_device_incarnation=1, tensor_name="edge_2700_Mean_2", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"]()]]
please help me to solve the problem! thanks!