EnyaHermite / SPH3D-GCN

Spherical Kernel for Efficient Graph Convolution on 3D Point Clouds
MIT License
169 stars 32 forks source link

error:CUB segmented reduce errorinvalid device function #14

Open GaHooooo opened 3 years ago

GaHooooo commented 3 years ago

thanks for your great job! ! ! but i have some question : when i train the modelnet40_cls , some error happen:

2020-11-15 20:56:49.388664 Traceback (most recent call last): File "/home/gahho/anaconda3/envs/sph3dgcn/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1334, in _do_call return fn(*args) File "/home/gahho/anaconda3/envs/sph3dgcn/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1319, in _run_fn options, feed_dict, fetch_list, target_list, run_metadata) File "/home/gahho/anaconda3/envs/sph3dgcn/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1407, in _call_tf_sessionrun run_metadata) tensorflow.python.framework.errors_impl.InternalError: CUB segmented reduce errorinvalid device function [[{{node Max}} = Max[T=DT_FLOAT, Tidx=DT_INT32, keep_dims=true, _device="/job:localhost/replica:0/task:0/device:GPU:0"](Sum, ArgMax/dimension)]] [[{{node GroupCrossDeviceControlEdges_0/Adam/value/_82}} = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_7776_GroupCrossDeviceControlEdges_0/Adam/value", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/home/gahho/SPH3D-GCN/modelnet40_cls/train_modelnet.py", line 376, in train() File "/home/gahho/SPH3D-GCN/modelnet40_cls/train_modelnet.py", line 247, in train train_one_epoch(sess, ops, next_train_element, train_writer) File "/home/gahho/SPH3D-GCN/modelnet40_cls/train_modelnet.py", line 292, in train_one_epoch ops['train_op'], ops['loss'], ops['pred']], feed_dict=feed_dict) File "/home/gahho/anaconda3/envs/sph3dgcn/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 929, in run run_metadata_ptr) File "/home/gahho/anaconda3/envs/sph3dgcn/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1152, in _run feed_dict_tensor, options, run_metadata) File "/home/gahho/anaconda3/envs/sph3dgcn/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1328, in _do_run run_metadata) File "/home/gahho/anaconda3/envs/sph3dgcn/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1348, in _do_call raise type(e)(node_def, op, message) tensorflow.python.framework.errors_impl.InternalError: CUB segmented reduce errorinvalid device function [[node Max (defined at /home/gahho/SPH3D-GCN/models/SPH3D_modelnet.py:13) = Max[T=DT_FLOAT, Tidx=DT_INT32, keep_dims=true, _device="/job:localhost/replica:0/task:0/device:GPU:0"](Sum, ArgMax/dimension)]] [[{{node GroupCrossDeviceControlEdges_0/Adam/value/_82}} = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_7776_GroupCrossDeviceControlEdges_0/Adam/value", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]

Caused by op 'Max', defined at: File "/home/gahho/SPH3D-GCN/modelnet40_cls/train_modelnet.py", line 376, in train() File "/home/gahho/SPH3D-GCN/modelnet40_cls/train_modelnet.py", line 161, in train pred, end_points = MODEL.get_model(xyz_pl, training_pl, config=net_config) File "/home/gahho/SPH3D-GCN/models/SPH3D_modelnet.py", line 42, in get_model points = normalize_xyz(points) File "/home/gahho/SPH3D-GCN/models/SPH3D_modelnet.py", line 13, in normalize_xyz scale = tf.reduce_max(tf.reduce_sum(tf.square(points),axis=-1,keepdims=True),axis=1,keepdims=True) File "/home/gahho/anaconda3/envs/sph3dgcn/lib/python3.5/site-packages/tensorflow/python/util/deprecation.py", line 488, in new_func return func(*args, *kwargs) File "/home/gahho/anaconda3/envs/sph3dgcn/lib/python3.5/site-packages/tensorflow/python/ops/math_ops.py", line 1643, in reduce_max name=name)) File "/home/gahho/anaconda3/envs/sph3dgcn/lib/python3.5/site-packages/tensorflow/python/ops/gen_math_ops.py", line 4641, in _max name=name) File "/home/gahho/anaconda3/envs/sph3dgcn/lib/python3.5/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper op_def=op_def) File "/home/gahho/anaconda3/envs/sph3dgcn/lib/python3.5/site-packages/tensorflow/python/util/deprecation.py", line 488, in new_func return func(args, **kwargs) File "/home/gahho/anaconda3/envs/sph3dgcn/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 3274, in create_op op_def=op_def) File "/home/gahho/anaconda3/envs/sph3dgcn/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 1770, in init self._traceback = tf_stack.extract_stack()

InternalError (see above for traceback): CUB segmented reduce errorinvalid device function [[node Max (defined at /home/gahho/SPH3D-GCN/models/SPH3D_modelnet.py:13) = Max[T=DT_FLOAT, Tidx=DT_INT32, keep_dims=true, _device="/job:localhost/replica:0/task:0/device:GPU:0"](Sum, ArgMax/dimension)]] [[{{node GroupCrossDeviceControlEdges_0/Adam/value/_82}} = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_7776_GroupCrossDeviceControlEdges_0/Adam/value", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]

when i set the batch_size=1,it doesn't cause the error,but the result is very bad i don't know how to fix it thanks for your reply

EnyaHermite commented 3 years ago

The problem might be caused by the tensorflow version you are using. We test the code in Tensorflow 1.12.