Floating point exception during model.update()

cvignac commented 3 years ago

Bug

Hello, I get a floating point exception when trying to update a CompressionModel. There is no stack for the error message, so I guess it comes from an internal C module.

Using prints, I could trace that the problem comes from:

model.update() -> self._pmf_to_cdf(pmf, tail_mass, pmf_length, max_length) -> _cdf = pmf_to_quantized_cdf(prob, self.entropy_coder_precision)

and that it was raised when calling the function on a Tensor p where all entries but one are zero

To Reproduce

Steps to reproduce the behavior:

Call

prob = torch.cat((p[: pmf_length[i]], tail_mass[i]), dim=0)
 _cdf = pmf_to_quantized_cdf(prob, self.entropy_coder_precision)

on the following tensor:

tensor([0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 1.6690e-33, 6.5444e-11, 1.0000e+00, 1.2011e-13, 1.0696e-36, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00], device='cuda:0', grad_fn=)

Expected behavior

I don't know what the returned value should be, but it seems that my problem is a corner case incorrectly handled

Environment

PyTorch version: 1.7.0 Is debug build: True CUDA used to build PyTorch: 10.2 ROCM used to build PyTorch: N/A

OS: Ubuntu 18.04.4 LTS (x86_64) GCC version: (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0 Clang version: Could not collect CMake version: version 3.10.2

Python version: 3.7 (64-bit runtime) Is CUDA available: True CUDA runtime version: 9.1.85 GPU models and configuration: GPU 0: Tesla V100-SXM2-32GB GPU 1: Tesla V100-SXM2-32GB GPU 2: Tesla V100-SXM2-32GB GPU 3: Tesla V100-SXM2-32GB

Nvidia driver version: 440.33.01 cuDNN version: /usr/lib/x86_64-linux-gnu/libcudnn.so.7.6.5 HIP runtime version: N/A MIOpen runtime version: N/A

Versions of relevant libraries: [pip3] numpy==1.19.4 [pip3] pytorch-msssim==0.2.0 [pip3] torch==1.7.0 [pip3] torch-cluster==1.5.8 [pip3] torch-geometric==1.6.3 [pip3] torch-scatter==2.0.5 [pip3] torch-sparse==0.6.8 [pip3] torch-spline-conv==1.2.0 [pip3] torchvision==0.8.1 [conda] numpy 1.19.4 pypi_0 pypi [conda] pytorch-msssim 0.2.0 pypi_0 pypi [conda] torch 1.7.0 pypi_0 pypi [conda] torch-cluster 1.5.8 pypi_0 pypi [conda] torch-geometric 1.6.3 pypi_0 pypi [conda] torch-scatter 2.0.5 pypi_0 pypi [conda] torch-sparse 0.6.8 pypi_0 pypi [conda] torch-spline-conv 1.2.0 pypi_0 pypi

- PyTorch / CompressAI Version (e.g., 1.0 / 0.4.0): torch   1.7.0, compressai  1.1.5
- OS (e.g., Linux): Ubuntu 18.04.4 LTS (Bionic Beaver)
- How you installed PyTorch / CompressAI (`pip`, source): pip 
- Build command you used (if compiling from source):
- Python version: 1.7.0
- CUDA/cuDNN version: 10.2
- GPU models and configuration:
- Any other relevant information: Problem appears both on cpu and gpu