CharlesShang / TFFRCNN

FastER RCNN built on tensorflow
MIT License
874 stars 418 forks source link

'No OpKernel was registered to support Op 'RoiPoolGrad' with these attrs' in Training #58

Closed jia2lin3yuan1 closed 7 years ago

jia2lin3yuan1 commented 7 years ago

The environment on my computer is Tensorflow 1.0.1 + python 2.7 + cuda 8.0. I can run the demo.py successfully follow the guidance. But when I go to the training part, and run:

> python ./faster_rcnn/train_net.py --gpu 0 --weights ./data/pretrain_model/VGG_imagenet.npy --imdb voc_2007_trainval --iters 70000 --cfg  ./experiments/cfgs/faster_rcnn_end2end.yml --network VGGnet_train --set EXP_DIR exp_dir

There is an error says:

  tensorflow.python.framework.errors_impl.InvalidArgumentError: No OpKernel was registered to support Op 'RoiPoolGrad' with these attrs.  Registered devices: [CPU], Registered kernels:
  device='GPU'; T in [DT_FLOAT]

It maybe that there is something wrong on my setting, but I just follow the guidance, and try to replicate the result on my computer till now. I'm totally not sure what the problem is. Is there anyone who has some ideas? Thank you!

Following is the detailed error message.

/home/yuanjial/tensorflow/local/lib/python2.7/site-packages/tensorflow/python/ops/gradients_impl.py:93: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory. "Converting sparse IndexedSlices to a dense Tensor of unknown shape. " Traceback (most recent call last): File "./faster_rcnn/train_net.py", line 108, in restore=bool(int(args.restore))) File "./faster_rcnn/../lib/fast_rcnn/train.py", line 400, in train_net sw.train_model(sess, max_iters, restore=restore) File "./faster_rcnn/../lib/fast_rcnn/train.py", line 148, in train_model sess.run(tf.global_variables_initializer()) File "/home/yuanjial/tensorflow/local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 778, in run run_metadata_ptr) File "/home/yuanjial/tensorflow/local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 982, in _run feed_dict_string, options, run_metadata) File "/home/yuanjial/tensorflow/local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1032, in _do_run target_list, options, run_metadata) File "/home/yuanjial/tensorflow/local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1052, in _do_call raise type(e)(node_def, op, message)

tensorflow.python.framework.errors_impl.InvalidArgumentError: No OpKernel was registered to support Op 'RoiPoolGrad' with these attrs. Registered devices: [CPU], Registered kernels: device='GPU'; T in [DT_FLOAT] [[Node: gradients/pool_5_grad/RoiPoolGrad = RoiPoolGrad[T=DT_FLOAT, pooled_height=7, pooled_width=7, spatial_scale=0.0625](conv5_3/Relu, roi-data/rois, pool_5:1, gradients/fc6/transpose_grad/transpose)]] Caused by op u'gradients/pool_5_grad/RoiPoolGrad', defined at: File "./faster_rcnn/train_net.py", line 108, in restore=bool(int(args.restore))) File "./faster_rcnn/../lib/fast_rcnn/train.py", line 400, in train_net sw.train_model(sess, max_iters, restore=restore) File "./faster_rcnn/../lib/fast_rcnn/train.py", line 142, in train_model grads, norm = tf.clip_by_global_norm(tf.gradients(loss, tvars), 10.0) File "/home/yuanjial/tensorflow/local/lib/python2.7/site-packages/tensorflow/python/ops/gradients_impl.py", line 560, in gradients grad_scope, op, func_call, lambda: grad_fn(op, out_grads)) File "/home/yuanjial/tensorflow/local/lib/python2.7/site-packages/tensorflow/python/ops/gradients_impl.py", line 368, in _MaybeCompile return grad_fn() # Exit early File "/home/yuanjial/tensorflow/local/lib/python2.7/site-packages/tensorflow/python/ops/gradients_impl.py", line 560, in grad_scope, op, func_call, lambda: grad_fn(op, out_grads)) File "./faster_rcnn/../lib/roi_pooling_layer/roi_pooling_op_grad.py", line 23, in _roi_pool_grad data_grad = roi_pooling_op.roi_pool_grad(data, rois, argmax, grad, pooled_height, pooled_width, spatial_scale) File "", line 74, in roi_pool_grad File "/home/yuanjial/tensorflow/local/lib/python2.7/site-packages/tensorflow/python/framework/op_def_library.py", line 768, in apply_op op_def=op_def) File "/home/yuanjial/tensorflow/local/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 2336, in create_op original_op=self._default_original_op, op_def=op_def) File "/home/yuanjial/tensorflow/local/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 1228, in init self._traceback = _extract_stack()

...which was originally created as op u'pool_5', defined at: File "./faster_rcnn/train_net.py", line 100, in network = get_network(args.network_name) File "./faster_rcnn/../lib/networks/factory.py", line 29, in get_network return VGGnet_train() File "./faster_rcnn/../lib/networks/VGGnet_train.py", line 17, in init self.setup() File "./faster_rcnn/../lib/networks/VGGnet_train.py", line 84, in setup .roi_pool(7, 7, 1.0/16, name='pool_5') File "./faster_rcnn/../lib/networks/network.py", line 36, in layer_decorated layer_output = op(self, layer_input, *args, **kwargs) File "./faster_rcnn/../lib/networks/network.py", line 235, in roi_pool name=name)[0] File "", line 45, in roi_pool File "/home/yuanjial/tensorflow/local/lib/python2.7/site-packages/tensorflow/python/framework/op_def_library.py", line 768, in apply_op op_def=op_def) File "/home/yuanjial/tensorflow/local/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 2336, in create_op original_op=self._default_original_op, op_def=op_def) File "/home/yuanjial/tensorflow/local/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 1228, in init self._traceback = _extract_stack()

InvalidArgumentError (see above for traceback): No OpKernel was registered to support Op 'RoiPoolGrad' with these attrs. Registered devices: [CPU], Registered kernels: device='GPU'; T in [DT_FLOAT]

[[Node: gradients/pool_5_grad/RoiPoolGrad = RoiPoolGrad[T=DT_FLOAT, pooled_height=7, pooled_width=7, spatial_scale=0.0625](conv5_3/Relu, roi-data/rois, pool_5:1, gradients/fc6/transpose_grad/transpose)]]

liuwei16 commented 7 years ago

Hello, I have encountered the same problem, did you have solved this problem?

jia2lin3yuan1 commented 7 years ago

My problem is that I installed tensorflow, not tensorflow-gpu. Once I changed it, the whole project could work expectedly.