NVIDIA / MinkowskiEngine

Minkowski Engine is an auto-diff neural network library for high-dimensional sparse tensors
https://nvidia.github.io/MinkowskiEngine
Other
2.47k stars 367 forks source link

Error in GenTransConv + GenTransConv? #403

Open LifeBeyondExpectations opened 3 years ago

LifeBeyondExpectations commented 3 years ago

Describe the bug

When I attach the same network that is proposed in 'example.completion'

# declare network
net = CompletionNet(...)

# inference from 1st auto encoder
preds, ... = net(inputs)

# update input using the prediction from the previous network
updated_inputs = inputs + preds

# inference from 2nd auto encoder
updated_preds, ... = net(updated_inputs) # <- raise errors.

Meanwhile, if I change the generative transpose conv in the second network with the original transpose conv, the error did not happen. Moreover, sometimes, during the inference, it is okay. But still, it raises errors while computing backward gradients.

You can easily reproduce errors.

I am not sure this is my own problem. Thank you.


To Reproduce https://github.com/NVIDIA/MinkowskiEngine/blob/b71caef9b45ab32c02d2c1b5f48f13c1eb5df526/examples/completion.py#L611

...
out_cls, targets, sout = net(sin, target_key)

# ADD two lines
sin = sin + out_cls[-1]
out_cls, targets, sout = net(sin, target_key)

# origin code
num_layers, loss = len(out_cls), 0
...

Expected behavior I guess I can sequentially attach two auto-encoder networks that contains generative transpose convolution layers. But I got errors.


Desktop (please complete the following information):

=========System========== Linux-5.11.0-36-generic-x86_64-with-debian-bullseye-sid DISTRIB_ID=Ubuntu DISTRIB_RELEASE=20.04 DISTRIB_CODENAME=focal DISTRIB_DESCRIPTION="Ubuntu 20.04.2 LTS" 3.7.10 (default, Feb 26 2021, 18:47:35) [GCC 7.3.0] ==========Pytorch========== 1.9.0+cu102 torch.cuda.is_available(): True ==========NVIDIA-SMI========== /usr/bin/nvidia-smi Driver Version 460.91.03 CUDA Version 11.2 VBIOS Version 86.04.17.00.01 Image Version G001.0000.01.03 ==========NVCC========== /usr/local/cuda/bin/nvcc nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2019 NVIDIA Corporation Built on Wed_Oct_23_19:24:38_PDT_2019 Cuda compilation tools, release 10.2, V10.2.89 ==========CC========== CC=g++-7 /usr/bin/g++-7 g++-7 (Ubuntu 7.5.0-6ubuntu2) 7.5.0 Copyright (C) 2017 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

==========MinkowskiEngine========== 0.5.4 MinkowskiEngine compiled with CUDA Support: True NVCC version MinkowskiEngine is compiled: 10020 CUDART version MinkowskiEngine is compiled: 10020


Additional context Add any other context about the problem here.

Let me print my error messages.

/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:97: operator(): block: [78,0,0], thread: [32,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:97: operator(): block: [78,0,0], thread: [33,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:97: operator(): block: [78,0,0], thread: [34,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:97: operator(): block: [78,0,0], thread: [35,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:97: operator(): block: [78,0,0], thread: [36,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:97: operator(): block: [78,0,0], thread: [37,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:97: operator(): block: [78,0,0], thread: [38,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:97: operator(): block: [78,0,0], thread: [39,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.