After some training steps the model returns NaN values.
Replacing:
ME.SparseTensorQuantizationMode.UNWEIGHTED_AVERAGE
with
ME.SparseTensorQuantizationMode.RANDOM_SUBSAMPLE
Like in issue #273 resolves the problem. Nevertheless, I would be interested to know if it is really due to the QuantizationMode and why it does not appear from the beginning on. Or am I misunderstanding something else?
Expected behavior
Both QuantizationMode's should return values.
Desktop (please complete the following information):
OS: Docker environment
Python version: 3.8.10
Pytorch version: 1.8.2
CUDA version: 11.6
NVIDIA Driver version: 510.47.03
Minkowski Engine version 0.5.4
ME.print_diagnostics():
==========System==========
Linux-5.10.0-1057-oem-x86_64-with-glibc2.29
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=20.04
DISTRIB_CODENAME=focal
DISTRIB_DESCRIPTION="Ubuntu 20.04.3 LTS"
3.8.10 (default, Nov 26 2021, 20:14:08)
[GCC 9.3.0]
==========Pytorch==========
1.8.2
torch.cuda.is_available(): True
==========NVIDIA-SMI==========
/usr/bin/nvidia-smi
Driver Version 510.47.03
CUDA Version 11.6
VBIOS Version 94.02.42.00.2F
Image Version G001.0000.03.03
GSP Firmware Version N/A
==========NVCC==========
/usr/local/cuda/bin/nvcc
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2020 NVIDIA Corporation
Built on Wed_Jul_22_19:09:09_PDT_2020
Cuda compilation tools, release 11.0, V11.0.221
Build cuda_11.0_bu.TC445_37.28845127_0
==========CC==========
/usr/bin/c++
c++ (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0
Copyright (C) 2019 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
==========MinkowskiEngine==========
0.5.4
MinkowskiEngine compiled with CUDA Support: True
NVCC version MinkowskiEngine is compiled: 11000
CUDART version MinkowskiEngine is compiled: 11000
Additional context
Thanks for sharing and supporting the code with the community. This project is really good.
Describe the bug
NaN's appear after some time.
To Reproduce
I'm using the following to generate the sparse tensor out of some point cloud:
Afterwards, I'm using the model for doing semantic predictions:
After some training steps the model returns NaN values.
Replacing:
ME.SparseTensorQuantizationMode.UNWEIGHTED_AVERAGE
withME.SparseTensorQuantizationMode.RANDOM_SUBSAMPLE
Like in issue #273 resolves the problem. Nevertheless, I would be interested to know if it is really due to the QuantizationMode and why it does not appear from the beginning on. Or am I misunderstanding something else?
Expected behavior
Both QuantizationMode's should return values.
Desktop (please complete the following information):
ME.print_diagnostics(): ==========System========== Linux-5.10.0-1057-oem-x86_64-with-glibc2.29 DISTRIB_ID=Ubuntu DISTRIB_RELEASE=20.04 DISTRIB_CODENAME=focal DISTRIB_DESCRIPTION="Ubuntu 20.04.3 LTS" 3.8.10 (default, Nov 26 2021, 20:14:08) [GCC 9.3.0] ==========Pytorch========== 1.8.2 torch.cuda.is_available(): True ==========NVIDIA-SMI========== /usr/bin/nvidia-smi Driver Version 510.47.03 CUDA Version 11.6 VBIOS Version 94.02.42.00.2F Image Version G001.0000.03.03 GSP Firmware Version N/A ==========NVCC========== /usr/local/cuda/bin/nvcc nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2020 NVIDIA Corporation Built on Wed_Jul_22_19:09:09_PDT_2020 Cuda compilation tools, release 11.0, V11.0.221 Build cuda_11.0_bu.TC445_37.28845127_0 ==========CC========== /usr/bin/c++ c++ (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0 Copyright (C) 2019 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
Additional context
Thanks for sharing and supporting the code with the community. This project is really good.