When I attach the same network that is proposed in 'example.completion'
# declare network
net = CompletionNet(...)
# inference from 1st auto encoder
preds, ... = net(inputs)
# update input using the prediction from the previous network
updated_inputs = inputs + preds
# inference from 2nd auto encoder
updated_preds, ... = net(updated_inputs) # <- raise errors.
Meanwhile, if I change the generative transpose conv in the second network with the original transpose conv,
the error did not happen.
Moreover, sometimes, during the inference, it is okay.
But still, it raises errors while computing backward gradients.
...
out_cls, targets, sout = net(sin, target_key)
# ADD two lines
sin = sin + out_cls[-1]
out_cls, targets, sout = net(sin, target_key)
# origin code
num_layers, loss = len(out_cls), 0
...
Expected behavior
I guess I can sequentially attach two auto-encoder networks that contains generative transpose convolution layers.
But I got errors.
Desktop (please complete the following information):
OS: [Ubuntu 20.04]
Python version: [3.7.10]
Pytorch version: [1.9.0+cu102]
CUDA version: [10.2]
NVIDIA Driver version: [460.91.03]
Minkowski Engine version [0.5.4]
Output of the following command. (If you installed the latest MinkowskiEngine, paste the output of python -c "import MinkowskiEngine as ME; ME.print_diagnostics()". Otherwise, paste the output of the following command.)
=========System==========
Linux-5.11.0-36-generic-x86_64-with-debian-bullseye-sid
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=20.04
DISTRIB_CODENAME=focal
DISTRIB_DESCRIPTION="Ubuntu 20.04.2 LTS"
3.7.10 (default, Feb 26 2021, 18:47:35)
[GCC 7.3.0]
==========Pytorch==========
1.9.0+cu102
torch.cuda.is_available(): True
==========NVIDIA-SMI==========
/usr/bin/nvidia-smi
Driver Version 460.91.03
CUDA Version 11.2
VBIOS Version 86.04.17.00.01
Image Version G001.0000.01.03
==========NVCC==========
/usr/local/cuda/bin/nvcc
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Wed_Oct_23_19:24:38_PDT_2019
Cuda compilation tools, release 10.2, V10.2.89
==========CC==========
CC=g++-7
/usr/bin/g++-7
g++-7 (Ubuntu 7.5.0-6ubuntu2) 7.5.0
Copyright (C) 2017 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
==========MinkowskiEngine==========
0.5.4
MinkowskiEngine compiled with CUDA Support: True
NVCC version MinkowskiEngine is compiled: 10020
CUDART version MinkowskiEngine is compiled: 10020
Additional context
Add any other context about the problem here.
Let me print my error messages.
/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:97: operator(): block: [78,0,0], thread: [32,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:97: operator(): block: [78,0,0], thread: [33,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:97: operator(): block: [78,0,0], thread: [34,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:97: operator(): block: [78,0,0], thread: [35,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:97: operator(): block: [78,0,0], thread: [36,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:97: operator(): block: [78,0,0], thread: [37,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:97: operator(): block: [78,0,0], thread: [38,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:97: operator(): block: [78,0,0], thread: [39,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
Describe the bug
When I attach the same network that is proposed in 'example.completion'
Meanwhile, if I change the generative transpose conv in the second network with the original transpose conv, the error did not happen. Moreover, sometimes, during the inference, it is okay. But still, it raises errors while computing backward gradients.
You can easily reproduce errors.
I am not sure this is my own problem. Thank you.
To Reproduce https://github.com/NVIDIA/MinkowskiEngine/blob/b71caef9b45ab32c02d2c1b5f48f13c1eb5df526/examples/completion.py#L611
Expected behavior I guess I can sequentially attach two auto-encoder networks that contains generative transpose convolution layers. But I got errors.
Desktop (please complete the following information):
python -c "import MinkowskiEngine as ME; ME.print_diagnostics()"
. Otherwise, paste the output of the following command.)=========System========== Linux-5.11.0-36-generic-x86_64-with-debian-bullseye-sid DISTRIB_ID=Ubuntu DISTRIB_RELEASE=20.04 DISTRIB_CODENAME=focal DISTRIB_DESCRIPTION="Ubuntu 20.04.2 LTS" 3.7.10 (default, Feb 26 2021, 18:47:35) [GCC 7.3.0] ==========Pytorch========== 1.9.0+cu102 torch.cuda.is_available(): True ==========NVIDIA-SMI========== /usr/bin/nvidia-smi Driver Version 460.91.03 CUDA Version 11.2 VBIOS Version 86.04.17.00.01 Image Version G001.0000.01.03 ==========NVCC========== /usr/local/cuda/bin/nvcc nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2019 NVIDIA Corporation Built on Wed_Oct_23_19:24:38_PDT_2019 Cuda compilation tools, release 10.2, V10.2.89 ==========CC========== CC=g++-7 /usr/bin/g++-7 g++-7 (Ubuntu 7.5.0-6ubuntu2) 7.5.0 Copyright (C) 2017 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
==========MinkowskiEngine========== 0.5.4 MinkowskiEngine compiled with CUDA Support: True NVCC version MinkowskiEngine is compiled: 10020 CUDART version MinkowskiEngine is compiled: 10020
Additional context Add any other context about the problem here.
Let me print my error messages.