AllenXiangX / SnowflakeNet

(TPAMI 2023) Snowflake Point Deconvolution for Point Cloud Completion and Generation with Skip-Transformer
MIT License
142 stars 16 forks source link

Problem with pointnet2_ops #19

Open matthiasjaeger95 opened 1 year ago

matthiasjaeger95 commented 1 year ago

Hello,

i wanna train your network with my own dataset and ran into that error:

Loaded compiled 3D CUDA chamfer distance
  0%|                                                                                                                                                                                                   | 0/87 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "train.py", line 165, in <module>
    train(config)
  File "train.py", line 104, in train
    pcds_pred = model(partial)
  File "/home/matthias/anaconda3/envs/spd/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "../models/model_completion.py", line 137, in forward
    feat = self.feat_extractor(point_cloud)
  File "/home/matthias/anaconda3/envs/spd/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "../models/model_completion.py", line 32, in forward
    l1_xyz, l1_points, idx1 = self.sa_module_1(l0_xyz, l0_points)  # (B, 3, 512), (B, 128, 512)
  File "/home/matthias/anaconda3/envs/spd/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "../models/utils.py", line 375, in forward
    new_xyz, new_points, idx, grouped_xyz = sample_and_group_knn(xyz, points, self.npoint, self.nsample, self.use_xyz, idx=idx)
  File "../models/utils.py", line 316, in sample_and_group_knn
    new_xyz = gather_operation(xyz, furthest_point_sample(xyz_flipped, npoint)) # (B, 3, npoint)
  File "/home/matthias/anaconda3/envs/spd/lib/python3.7/site-packages/pointnet2_ops-3.0.0-py3.7-linux-x86_64.egg/pointnet2_ops/pointnet2_utils.py", line 54, in forward
    out = _ext.furthest_point_sampling(xyz, npoint)
RuntimeError: false INTERNAL ASSERT FAILED at "pointnet2_ops/_ext-src/src/sampling.cpp":83, please report a bug to PyTorch. CPU not supported

It seems like there is a problem with pointnet2_ops extension. I've created the enviroment following your repo. My Cuda Version is 11.4. Could you help me with that?

Thank you in advance.

AllenXiangX commented 1 year ago

Hi, it seems that you have compiled the pointnet2_ops extension with cpu version PyTorch. RuntimeError: false INTERNAL ASSERT FAILED at "pointnet2_ops/_ext-src/src/sampling.cpp":83, please report a bug to PyTorch. CPU not supported Please update your PyTorch into a cuda version and try compiling again.

huyanbi commented 1 year ago

@AllenXiangX ,Hello, I have also encountered this issue. I can confirm that I am using the cuda version of the torch. I encountered it while training on point cloud completion.(spd) root@I1230eab0b100201c25:/hy-tmp/SnowflakeNet/completion# python train.py --config ./configs/pcn_cd1.yaml Loaded compiled 3D CUDA chamfer distance 0%| | 0/906 [00:03<?, ?it/s] Traceback (most recent call last): File "train.py", line 166, in <module> train(config) File "train.py", line 105, in train pcds_pred = model(partial) File "/usr/local/miniconda3/envs/spd/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(*input, **kwargs) File "../models/model_completion.py", line 137, in forward feat = self.feat_extractor(point_cloud) File "/usr/local/miniconda3/envs/spd/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(*input, **kwargs) File "../models/model_completion.py", line 32, in forward l1_xyz, l1_points, idx1 = self.sa_module_1(l0_xyz, l0_points) # (B, 3, 512), (B, 128, 512) File "/usr/local/miniconda3/envs/spd/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(*input, **kwargs) File "../models/utils.py", line 375, in forward new_xyz, new_points, idx, grouped_xyz = sample_and_group_knn(xyz, points, self.npoint, self.nsample, self.use_xyz, idx=idx) File "../models/utils.py", line 316, in sample_and_group_knn new_xyz = gather_operation(xyz, furthest_point_sample(xyz_flipped, npoint)) # (B, 3, npoint) File "/usr/local/miniconda3/envs/spd/lib/python3.7/site-packages/pointnet2_ops-3.0.0-py3.7-linux-x86_64.egg/pointnet2_ops/pointnet2_utils.py", line 54, in forward out = _ext.furthest_point_sampling(xyz, npoint) RuntimeError: false INTERNAL ASSERT FAILED at "pointnet2_ops/_ext-src/src/sampling.cpp":83, please report a bug to PyTorch. CPU not supported

huyanbi commented 1 year ago

@AllenXiangX ,I compiled again and the situation is as follows`(spd) root@I1230eab0b100201c25:/hy-tmp/SnowflakeNet/models/pointnet2_ops_lib# python setup.py install running install /usr/local/miniconda3/envs/spd/lib/python3.7/site-packages/setuptools/command/install.py:37: SetuptoolsDeprecationWarning: setup.py install is deprecated. Use build and pip and other standards-based tools. setuptools.SetuptoolsDeprecationWarning, /usr/local/miniconda3/envs/spd/lib/python3.7/site-packages/setuptools/command/easy_install.py:147: EasyInstallDeprecationWarning: easy_install command is deprecated. Use build and pip and other standards-based tools. EasyInstallDeprecationWarning, running bdist_egg running egg_info writing pointnet2_ops.egg-info/PKG-INFO writing dependency_links to pointnet2_ops.egg-info/dependency_links.txt writing requirements to pointnet2_ops.egg-info/requires.txt writing top-level names to pointnet2_ops.egg-info/top_level.txt /usr/local/miniconda3/envs/spd/lib/python3.7/site-packages/torch/utils/cpp_extension.py:352: UserWarning: Attempted to use ninja as the BuildExtension backend but we could not find ninja.. Falling back to using the slow distutils backend. warnings.warn(msg.format('we could not find ninja.')) reading manifest file 'pointnet2_ops.egg-info/SOURCES.txt' writing manifest file 'pointnet2_ops.egg-info/SOURCES.txt' installing library code to build/bdist.linux-x86_64/egg running install_lib running build_py running build_ext creating build/bdist.linux-x86_64/egg creating build/bdist.linux-x86_64/egg/pointnet2_ops copying build/lib.linux-x86_64-cpython-37/pointnet2_ops/init.py -> build/bdist.linux-x86_64/egg/pointnet2_ops copying build/lib.linux-x86_64-cpython-37/pointnet2_ops/_version.py -> build/bdist.linux-x86_64/egg/pointnet2_ops copying build/lib.linux-x86_64-cpython-37/pointnet2_ops/pointnet2_modules.py -> build/bdist.linux-x86_64/egg/pointnet2_ops copying build/lib.linux-x86_64-cpython-37/pointnet2_ops/pointnet2_utils.py -> build/bdist.linux-x86_64/egg/pointnet2_ops creating build/bdist.linux-x86_64/egg/pointnet2_ops/_ext-src creating build/bdist.linux-x86_64/egg/pointnet2_ops/_ext-src/src copying build/lib.linux-x86_64-cpython-37/pointnet2_ops/_ext-src/src/ball_query.cpp -> build/bdist.linux-x86_64/egg/pointnet2_ops/_ext-src/src copying build/lib.linux-x86_64-cpython-37/pointnet2_ops/_ext-src/src/ball_query_gpu.cu -> build/bdist.linux-x86_64/egg/pointnet2_ops/_ext-src/src copying build/lib.linux-x86_64-cpython-37/pointnet2_ops/_ext-src/src/bindings.cpp -> build/bdist.linux-x86_64/egg/pointnet2_ops/_ext-src/src copying build/lib.linux-x86_64-cpython-37/pointnet2_ops/_ext-src/src/group_points.cpp -> build/bdist.linux-x86_64/egg/pointnet2_ops/_ext-src/src copying build/lib.linux-x86_64-cpython-37/pointnet2_ops/_ext-src/src/group_points_gpu.cu -> build/bdist.linux-x86_64/egg/pointnet2_ops/_ext-src/src copying build/lib.linux-x86_64-cpython-37/pointnet2_ops/_ext-src/src/interpolate.cpp -> build/bdist.linux-x86_64/egg/pointnet2_ops/_ext-src/src copying build/lib.linux-x86_64-cpython-37/pointnet2_ops/_ext-src/src/interpolate_gpu.cu -> build/bdist.linux-x86_64/egg/pointnet2_ops/_ext-src/src copying build/lib.linux-x86_64-cpython-37/pointnet2_ops/_ext-src/src/sampling.cpp -> build/bdist.linux-x86_64/egg/pointnet2_ops/_ext-src/src copying build/lib.linux-x86_64-cpython-37/pointnet2_ops/_ext-src/src/sampling_gpu.cu -> build/bdist.linux-x86_64/egg/pointnet2_ops/_ext-src/src copying build/lib.linux-x86_64-cpython-37/pointnet2_ops/_ext.cpython-37m-x86_64-linux-gnu.so -> build/bdist.linux-x86_64/egg/pointnet2_ops byte-compiling build/bdist.linux-x86_64/egg/pointnet2_ops/init.py to init.cpython-37.pyc byte-compiling build/bdist.linux-x86_64/egg/pointnet2_ops/_version.py to _version.cpython-37.pyc byte-compiling build/bdist.linux-x86_64/egg/pointnet2_ops/pointnet2_modules.py to pointnet2_modules.cpython-37.pyc byte-compiling build/bdist.linux-x86_64/egg/pointnet2_ops/pointnet2_utils.py to pointnet2_utils.cpython-37.pyc creating stub loader for pointnet2_ops/_ext.cpython-37m-x86_64-linux-gnu.so byte-compiling build/bdist.linux-x86_64/egg/pointnet2_ops/_ext.py to _ext.cpython-37.pyc creating build/bdist.linux-x86_64/egg/EGG-INFO copying pointnet2_ops.egg-info/PKG-INFO -> build/bdist.linux-x86_64/egg/EGG-INFO copying pointnet2_ops.egg-info/SOURCES.txt -> build/bdist.linux-x86_64/egg/EGG-INFO copying pointnet2_ops.egg-info/dependency_links.txt -> build/bdist.linux-x86_64/egg/EGG-INFO copying pointnet2_ops.egg-info/requires.txt -> build/bdist.linux-x86_64/egg/EGG-INFO copying pointnet2_ops.egg-info/top_level.txt -> build/bdist.linux-x86_64/egg/EGG-INFO writing build/bdist.linux-x86_64/egg/EGG-INFO/native_libs.txt zip_safe flag not set; analyzing archive contents... pointnet2_ops.pycache._ext.cpython-37: module references file pointnet2_ops.pycache.pointnet2_utils.cpython-37: module references file creating 'dist/pointnet2_ops-3.0.0-py3.7-linux-x86_64.egg' and adding 'build/bdist.linux-x86_64/egg' to it removing 'build/bdist.linux-x86_64/egg' (and everything under it) Processing pointnet2_ops-3.0.0-py3.7-linux-x86_64.egg removing '/usr/local/miniconda3/envs/spd/lib/python3.7/site-packages/pointnet2_ops-3.0.0-py3.7-linux-x86_64.egg' (and everything under it) creating /usr/local/miniconda3/envs/spd/lib/python3.7/site-packages/pointnet2_ops-3.0.0-py3.7-linux-x86_64.egg Extracting pointnet2_ops-3.0.0-py3.7-linux-x86_64.egg to /usr/local/miniconda3/envs/spd/lib/python3.7/site-packages pointnet2-ops 3.0.0 is already the active version in easy-install.pth

Installed /usr/local/miniconda3/envs/spd/lib/python3.7/site-packages/pointnet2_ops-3.0.0-py3.7-linux-x86_64.egg Processing dependencies for pointnet2-ops==3.0.0 Searching for torch==1.7.1+cu110 Best match: torch 1.7.1+cu110 Adding torch 1.7.1+cu110 to easy-install.pth file Installing convert-caffe2-to-onnx script to /usr/local/miniconda3/envs/spd/bin Installing convert-onnx-to-caffe2 script to /usr/local/miniconda3/envs/spd/bin

Using /usr/local/miniconda3/envs/spd/lib/python3.7/site-packages Searching for numpy==1.21.6 Best match: numpy 1.21.6 Adding numpy 1.21.6 to easy-install.pth file Installing f2py script to /usr/local/miniconda3/envs/spd/bin Installing f2py3 script to /usr/local/miniconda3/envs/spd/bin Installing f2py3.7 script to /usr/local/miniconda3/envs/spd/bin

Using /usr/local/miniconda3/envs/spd/lib/python3.7/site-packages Searching for typing-extensions==4.5.0 Best match: typing-extensions 4.5.0 Adding typing-extensions 4.5.0 to easy-install.pth file

Using /usr/local/miniconda3/envs/spd/lib/python3.7/site-packages Finished processing dependencies for pointnet2-ops==3.0.0`You can see from above that the CUDA version was successfully compiled

AllenXiangX commented 1 year ago

Perhaps some of the tensors or operations are on the cpu. Please try manually specify the gpus before runing the training script as follows: export CUDA_VISIBLE_DEVICES='2' python train.py --config ./configs/pcn_cd1.yaml

huyanbi commented 1 year ago

@AllenXiangX Okay, I'll try

huyanbi commented 1 year ago

@AllenXiangX .hellow,After trying, I found that it didn't solve the problem.

sjYoondeltar commented 10 months ago

I saw the same error message. In my case, I modify the config yaml file. In the config yaml, I can modify train:gpus:[2] to train:gpus:[0] and the error fixed