Hi, I am using MinkowskiEngine 0.5.4 to build my network. When I use larger batch size, e.g. 32, 64 or 128, an CUDA error: misaligned address happens. The detail of this error is showed below:
File "training/train.py", line 56, in
do_train(dataloaders, train_sampler, params, debug=args.debug)
File "/mnt/lustre/zhoumengjie/Image-to-2-5DMap/training/trainer_backup.py", line 230, in do_train
loss.backward()
File "/mnt/lustre/zhoumengjie/.conda/envs/zmj-mink/lib/python3.8/site-packages/torch/tensor.py", line 221, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph)
File "/mnt/lustre/zhoumengjie/.conda/envs/zmj-mink/lib/python3.8/site-packages/torch/autograd/init.py", line 130, in backward
Variable._execution_engine.run_backward(
File "/mnt/lustre/zhoumengjie/.conda/envs/zmj-mink/lib/python3.8/site-packages/torch/autograd/function.py", line 89, in apply
return self._forward_cls.backward(self, *args) # type: ignore
File "/mnt/lustre/zhoumengjie/.conda/envs/zmj-mink/lib/python3.8/site-packages/MinkowskiEngine-0.5.4-py3.8-linux-x86_64.egg/MinkowskiEngine/MinkowskiBroadcast.py", line 87, in backward
grad_in_feat, grad_in_feat_glob = bw_fn(
RuntimeError: misaligned address at /mnt/lustre/zhoumengjie/Image-to-2-5DMap/MinkowskiEngine/src/broadcast_kernel.cu:402
terminate called after throwing an instance of 'c10::Error'
what(): CUDA error: misaligned address
It looks like that the error happened during the bakcward phase. When I use bacth size 32, this error would happen when it runs to a specific epoch-batch. I checked the data and found that there are too many points (60,0000+) in this batch. But for other batches, the number of points is around 30,0000+. So I downsample the point cloud again, and it can work for batch size 32. However, there still exisits an limitation for the larger batch size. Is this possible to make MinkowskiEngine process a larger batch of data without downsampling? I'm looking forward to a more effective method.
Output of the following command. (If you installed the latest MinkowskiEngine, paste the output of python -c "import MinkowskiEngine as ME; ME.print_diagnostics()":
/mnt/lustre/zhoumengjie/.conda/envs/zmj-mink/lib/python3.8/site-packages/MinkowskiEngine-0.5.4-py3.8-linux-x86_64.egg/MinkowskiEngine/init.py:36: UserWarning: The environment variable OMP_NUM_THREADS not set. MinkowskiEngine will automatically set OMP_NUM_THREADS=16. If you want to set OMP_NUM_THREADS manually, please export it on the command line before running a python script. e.g. export OMP_NUM_THREADS=12; python your_program.py. It is recommended to set it below 24.
warnings.warn(
==========System==========
Linux-3.10.0-957.el7.x86_64-x86_64-with-glibc2.17
3.8.13 (default, Mar 28 2022, 11:38:47)
[GCC 7.5.0]
==========Pytorch==========
1.7.1
torch.cuda.is_available(): True
==========NVIDIA-SMI==========
Driver Version 460.32.03
CUDA Version 11.2
VBIOS Version 92.00.19.00.10
Image Version G506.0200.00.04
==========NVCC==========
==========CC==========
==========MinkowskiEngine==========
0.5.4
MinkowskiEngine compiled with CUDA Support: True
NVCC version MinkowskiEngine is compiled: 11000
CUDART version MinkowskiEngine is compiled: 11000
Hi, I am using MinkowskiEngine 0.5.4 to build my network. When I use larger batch size, e.g. 32, 64 or 128, an CUDA error: misaligned address happens. The detail of this error is showed below:
File "training/train.py", line 56, in
do_train(dataloaders, train_sampler, params, debug=args.debug)
File "/mnt/lustre/zhoumengjie/Image-to-2-5DMap/training/trainer_backup.py", line 230, in do_train
loss.backward()
File "/mnt/lustre/zhoumengjie/.conda/envs/zmj-mink/lib/python3.8/site-packages/torch/tensor.py", line 221, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph)
File "/mnt/lustre/zhoumengjie/.conda/envs/zmj-mink/lib/python3.8/site-packages/torch/autograd/init.py", line 130, in backward
Variable._execution_engine.run_backward(
File "/mnt/lustre/zhoumengjie/.conda/envs/zmj-mink/lib/python3.8/site-packages/torch/autograd/function.py", line 89, in apply
return self._forward_cls.backward(self, *args) # type: ignore
File "/mnt/lustre/zhoumengjie/.conda/envs/zmj-mink/lib/python3.8/site-packages/MinkowskiEngine-0.5.4-py3.8-linux-x86_64.egg/MinkowskiEngine/MinkowskiBroadcast.py", line 87, in backward
grad_in_feat, grad_in_feat_glob = bw_fn(
RuntimeError: misaligned address at /mnt/lustre/zhoumengjie/Image-to-2-5DMap/MinkowskiEngine/src/broadcast_kernel.cu:402
terminate called after throwing an instance of 'c10::Error'
what(): CUDA error: misaligned address
It looks like that the error happened during the bakcward phase. When I use bacth size 32, this error would happen when it runs to a specific epoch-batch. I checked the data and found that there are too many points (60,0000+) in this batch. But for other batches, the number of points is around 30,0000+. So I downsample the point cloud again, and it can work for batch size 32. However, there still exisits an limitation for the larger batch size. Is this possible to make MinkowskiEngine process a larger batch of data without downsampling? I'm looking forward to a more effective method.
python -c "import MinkowskiEngine as ME; ME.print_diagnostics()"
:OMP_NUM_THREADS
not set. MinkowskiEngine will automatically setOMP_NUM_THREADS=16
. If you want to setOMP_NUM_THREADS
manually, please export it on the command line before running a python script. e.g.export OMP_NUM_THREADS=12; python your_program.py
. It is recommended to set it below 24. warnings.warn( ==========System========== Linux-3.10.0-957.el7.x86_64-x86_64-with-glibc2.17 3.8.13 (default, Mar 28 2022, 11:38:47) [GCC 7.5.0] ==========Pytorch========== 1.7.1 torch.cuda.is_available(): True ==========NVIDIA-SMI========== Driver Version 460.32.03 CUDA Version 11.2 VBIOS Version 92.00.19.00.10 Image Version G506.0200.00.04 ==========NVCC========== ==========CC========== ==========MinkowskiEngine========== 0.5.4 MinkowskiEngine compiled with CUDA Support: True NVCC version MinkowskiEngine is compiled: 11000 CUDART version MinkowskiEngine is compiled: 11000