drethage / fully-convolutional-point-network

Fully-Convolutional Point Networks for Large-Scale Point Clouds
MIT License
85 stars 22 forks source link

InvalidArgumentError: No OpKernel was registered to support Op 'QueryBallPoint' with these attrs #4

Closed hanxuel closed 5 years ago

hanxuel commented 5 years ago

Extracted 248 samples from 12 items in train set Extracted 837 samples from 21 items in val set

2019-03-06 16:24:42.091695: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA

Parameters in model: 10076067

Layers in model: (Name - Shape - # weights) abstraction/points_to_15cm/simplified_pointnet/1x1_conv_1/weights:0 - (1, 1, 4, 64) - 256 abstraction/points_to_15cm/simplified_pointnet/1x1_conv_2/weights:0 - (1, 1, 64, 96) - 6144 abstraction/points_to_15cm/skip_15cm/1x1x1_conv/weights:0 - (1, 1, 1, 96, 256) - 24576 abstraction/points_to_15cm/skip_45cm/3x3x3_conv/weights:0 - (3, 3, 3, 96, 256) - 663552 abstraction/15cm_to_30cm/3d_convolution/2x2x2_conv/weights:0 - (2, 2, 2, 96, 128) - 98304 abstraction/15cm_to_30cm/3d_convolution/1x1_conv_1/weights:0 - (1, 1, 1, 128, 192) - 24576 abstraction/15cm_to_30cm/skip_30cm/1x1x1_conv/weights:0 - (1, 1, 1, 192, 256) - 49152 abstraction/15cm_to_30cm/skip_90cm/3x3x3_conv/weights:0 - (3, 3, 3, 192, 256) - 1327104 abstraction/30cm_to_60cm/3d_convolution/2x2x2_conv/weights:0 - (2, 2, 2, 192, 256) - 393216 abstraction/30cm_to_60cm/3d_convolution/1x1x1_conv_1/weights:0 - (1, 1, 1, 256, 512) - 131072 abstraction/30cm_to_60cm/skip_60cm/1x1x1_conv_1/weights:0 - (1, 1, 1, 512, 256) - 131072 abstraction/30cm_to_60cm/skip_180cm/3x3x3_conv/weights:0 - (3, 3, 3, 512, 256) - 3538944 spatial_pool/skip_spatial_pool/1x1x1_conv/weights:0 - (1, 1, 1, 512, 256) - 131072 upsampling/60cm_to_30cm/2x2x2_deconv/weights:0 - (2, 2, 2, 256, 768) - 1572864 upsampling/60cm_to_30cm/1x1x1_conv_1/weights:0 - (1, 1, 1, 256, 256) - 65536 upsampling/60cm_to_30cm/1x1x1_conv_2/weights:0 - (1, 1, 1, 256, 192) - 49152 upsampling/30cm_to_15cm/2x2x2_deconv/weights:0 - (2, 2, 2, 192, 704) - 1081344 upsampling/30cm_to_15cm/1x1x1_conv_1/weights:0 - (1, 1, 1, 192, 128) - 24576 upsampling/30cm_to_15cm/1x1x1_conv_3/weights:0 - (1, 1, 1, 384, 128) - 49152 upsampling/15cm_to_5cm/final_deconv/weights:0 - (3, 3, 3, 64, 384) - 663552 upsampling/15cm_to_5cm/final_conv/weights:0 - (3, 3, 3, 64, 22) - 38016 Traceback (most recent call last): File "/cluster/home/haliang/.local/lib64/python3.6/site-packages/tensorflow/python/client/session.py", line 1334, in _do_call return fn(*args) File "/cluster/home/haliang/.local/lib64/python3.6/site-packages/tensorflow/python/client/session.py", line 1317, in _run_fn self._extend_graph() File "/cluster/home/haliang/.local/lib64/python3.6/site-packages/tensorflow/python/client/session.py", line 1352, in _extend_graph tf_session.ExtendSession(self._session) tensorflow.python.framework.errors_impl.InvalidArgumentError: No OpKernel was registered to support Op 'QueryBallPoint' with these attrs. Registered devices: [CPU,XLA_CPU], Registered kernels: device='GPU'

 [[{{node abstraction/points_to_15cm/simplified_pointnet/QueryBallPoint}} = QueryBallPoint[nsample=64, radius=0.129903808, _device="/device:GPU:0"](fifo_queue_DequeueMany, Reshape_1)]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "main.py", line 31, in main() File "main.py", line 26, in main training.train(cla.config) File "/cluster/home/haliang/fully-convolutional-point-network/training.py", line 245, in train sess.run([init_g, init_l], {is_training_pl: True}) File "/cluster/home/haliang/.local/lib64/python3.6/site-packages/tensorflow/python/client/session.py", line 929, in run run_metadata_ptr) File "/cluster/home/haliang/.local/lib64/python3.6/site-packages/tensorflow/python/client/session.py", line 1152, in _run feed_dict_tensor, options, run_metadata) File "/cluster/home/haliang/.local/lib64/python3.6/site-packages/tensorflow/python/client/session.py", line 1328, in _do_run run_metadata) File "/cluster/home/haliang/.local/lib64/python3.6/site-packages/tensorflow/python/client/session.py", line 1348, in _do_call raise type(e)(node_def, op, message) tensorflow.python.framework.errors_impl.InvalidArgumentError: No OpKernel was registered to support Op 'QueryBallPoint' with these attrs. Registered devices: [CPU,XLA_CPU], Registered kernels: device='GPU'

 [[node abstraction/points_to_15cm/simplified_pointnet/QueryBallPoint (defined at <string>:55)  = QueryBallPoint[nsample=64, radius=0.129903808, _device="/device:GPU:0"](fifo_queue_DequeueMany, Reshape_1)]]

Caused by op 'abstraction/points_to_15cm/simplified_pointnet/QueryBallPoint', defined at: File "main.py", line 31, in main() File "main.py", line 26, in main training.train(cla.config) File "/cluster/home/haliang/fully-convolutional-point-network/training.py", line 209, in train queue_batch_placeholders['input_features_pl'], is_training_pl, dataset.get_num_learnable_classes(), batch_normalization_decay) File "/cluster/home/haliang/fully-convolutional-point-network/fcpn.py", line 217, in build_model grouped_points_xyz_and_features = self.radius_search_and_group(pointnet_locations, self.get_pointnet_radius(self._config['model']['pointnet']['spacing']), self._config['model']['pointnet']['neighbors'], points_xyz, points_features) File "/cluster/home/haliang/fully-convolutional-point-network/fcpn.py", line 174, in radius_search_and_group pointindices, = tf_grouping.query_ball_point(radius, num_neighbors, points_xyz, centroids_xyz) File "tf_grouping/tf_grouping.py", line 22, in query_ball_point return grouping_module.query_ball_point(xyz1, xyz2, radius, nsample) File "", line 55, in query_ball_point File "/cluster/home/haliang/.local/lib64/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper op_def=op_def) File "/cluster/home/haliang/.local/lib64/python3.6/site-packages/tensorflow/python/util/deprecation.py", line 488, in new_func return func(*args, **kwargs) File "/cluster/home/haliang/.local/lib64/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3274, in create_op op_def=op_def) File "/cluster/home/haliang/.local/lib64/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1770, in init self._traceback = tf_stack.extract_stack()

InvalidArgumentError (see above for traceback): No OpKernel was registered to support Op 'QueryBallPoint' with these attrs. Registered devices: [CPU,XLA_CPU], Registered kernels: device='GPU'

 [[node abstraction/points_to_15cm/simplified_pointnet/QueryBallPoint (defined at <string>:55)  = QueryBallPoint[nsample=64, radius=0.129903808, _device="/device:GPU:0"](fifo_queue_DequeueMany, Reshape_1)]]
hanxuel commented 5 years ago

I am working with gpu cluster, cuda/9.0.176 cudnn/7.0 jpeg/9b libpng/1.6.27 python_gpu/3.6.4, any idea about what happens?

drethage commented 5 years ago

Hi hanxuel, its unclear exactly why your configuration is crashing, but it likely has to do with the fact that you're using different versions of CUDA/CUDNN (maybe Tensorflow?). The compilation instructions in tf_grouping_compile.sh are only guaranteed to work with the configuration stated in the README: Tensorflow 1.12, CUDA 9.0, CUDNN 7.4.1 on Ubuntu 16.04 LTS

hanxuel commented 5 years ago

Hi drethage, I manage to handle the problem. I think it's because tensorflow-gpu must be run on GPU, while I happen to make it run on CPU device.