CUDA error: misaligned address

Hi, I am using MinkowskiEngine 0.5.4 to build my network. When I use larger batch size, e.g. 32, 64 or 128, an CUDA error: misaligned address happens. The detail of this error is showed below:

File "training/train.py", line 56, in do_train(dataloaders, train_sampler, params, debug=args.debug) File "/mnt/lustre/zhoumengjie/Image-to-2-5DMap/training/trainer_backup.py", line 230, in do_train loss.backward() File "/mnt/lustre/zhoumengjie/.conda/envs/zmj-mink/lib/python3.8/site-packages/torch/tensor.py", line 221, in backward torch.autograd.backward(self, gradient, retain_graph, create_graph) File "/mnt/lustre/zhoumengjie/.conda/envs/zmj-mink/lib/python3.8/site-packages/torch/autograd/init.py", line 130, in backward Variable._execution_engine.run_backward( File "/mnt/lustre/zhoumengjie/.conda/envs/zmj-mink/lib/python3.8/site-packages/torch/autograd/function.py", line 89, in apply return self._forward_cls.backward(self, *args) # type: ignore File "/mnt/lustre/zhoumengjie/.conda/envs/zmj-mink/lib/python3.8/site-packages/MinkowskiEngine-0.5.4-py3.8-linux-x86_64.egg/MinkowskiEngine/MinkowskiBroadcast.py", line 87, in backward grad_in_feat, grad_in_feat_glob = bw_fn( RuntimeError: misaligned address at /mnt/lustre/zhoumengjie/Image-to-2-5DMap/MinkowskiEngine/src/broadcast_kernel.cu:402 terminate called after throwing an instance of 'c10::Error' what(): CUDA error: misaligned address

It looks like that the error happened during the bakcward phase. When I use bacth size 32, this error would happen when it runs to a specific epoch-batch. I checked the data and found that there are too many points (60,0000+) in this batch. But for other batches, the number of points is around 30,0000+. So I downsample the point cloud again, and it can work for batch size 32. However, there still exisits an limitation for the larger batch size. Is this possible to make MinkowskiEngine process a larger batch of data without downsampling? I'm looking forward to a more effective method.

Output of the following command. (If you installed the latest MinkowskiEngine, paste the output of python -c "import MinkowskiEngine as ME; ME.print_diagnostics()"：
/mnt/lustre/zhoumengjie/.conda/envs/zmj-mink/lib/python3.8/site-packages/MinkowskiEngine-0.5.4-py3.8-linux-x86_64.egg/MinkowskiEngine/init.py:36: UserWarning: The environment variable OMP_NUM_THREADS not set. MinkowskiEngine will automatically set OMP_NUM_THREADS=16. If you want to set OMP_NUM_THREADS manually, please export it on the command line before running a python script. e.g. export OMP_NUM_THREADS=12; python your_program.py. It is recommended to set it below 24. warnings.warn( ==========System========== Linux-3.10.0-957.el7.x86_64-x86_64-with-glibc2.17 3.8.13 (default, Mar 28 2022, 11:38:47) [GCC 7.5.0] ==========Pytorch========== 1.7.1 torch.cuda.is_available(): True ==========NVIDIA-SMI========== Driver Version 460.32.03 CUDA Version 11.2 VBIOS Version 92.00.19.00.10 Image Version G506.0200.00.04 ==========NVCC========== ==========CC========== ==========MinkowskiEngine========== 0.5.4 MinkowskiEngine compiled with CUDA Support: True NVCC version MinkowskiEngine is compiled: 11000 CUDART version MinkowskiEngine is compiled: 11000

NVIDIA / MinkowskiEngine

CUDA error: misaligned address #488