Lightning-AI / torchmetrics

Machine learning metrics for distributed, scalable PyTorch applications.
https://lightning.ai/docs/torchmetrics/
Apache License 2.0
2.12k stars 404 forks source link

The `divided by zeros` problem in low precision when use `uqi`, `sdi`, `ssim` and `ms-ssim` #2281

Closed michael080808 closed 8 months ago

michael080808 commented 10 months ago

πŸ› Bug

Hi, guys! Thanks for your fast response and fix! I really appreciate that. When I tried to compute the uqi function, I found that there is still some situation that the quotients will be zeros. Accroding to https://en.wikipedia.org/wiki/Variance, I know that Variance Equation And I tried to debug the code for understanding why I got NaN values. I noticed that https://github.com/Lightning-AI/torchmetrics/blob/62adb4022b30e0fe5c47589d9ecd82d3ff041117/src/torchmetrics/functional/image/uqi.py#L105-L106 produces very small negative values around 5e-7. It seems that the results give the "negative variance" values due to float deviations, which is impossible for math. Although there is a eps to avoid zero, the absolute value of "negative variance" is much greater than eps, and the "negative variance" finally leads to this NaN or "divided by zeros" problems. sdi uses the uqi results so it produced the same NaN. Besides, ssim and ms-ssim use C1 and C2 to prevent this thing but in fact they also suffer the same situation.

To Reproduce

I'm sorry that I can't give the original data due to the upload limitations. Each file is 1.15GB. It's very tricky to reproduce the problem. If there are some other methods to upload data, I’m glad to provide my data. 😊

Expected behavior

I'm not sure how to solve the problem. torch.unfold can compute as the variance defination but will cost huge amount of memory which is not acceptable. I think it's better to add a check of the variance to prevent negetive ones join the quotients. However, I don't know which is better between using the absolute values of negetive results and just leaving them as zeros.

Environment

Name Version Build Channel
_anaconda_depends 2023.07 py311_0 https://repo.anaconda.com/pkgs/main
conda 23.9.0 py311haa95532_0 https://repo.anaconda.com/pkgs/main
python 3.10.13 he1021f5_0 defaults
pytorch 2.1.0 py3.10_cuda12.1_cudnn8_0 pytorch
pytorch-cuda 12.1 hde6ce7c_5 pytorch
pytorch-mutex 1.0 cuda pytorch
torchaudio 2.1.0 pypi_0 pypi
torchmetrics 1.2.1 pyhd8ed1ab_0 conda-forge
torchvision 0.16.0 pypi_0 pypi
eminbdr commented 10 months ago

To solve this issue, one approach can be using Fractions instead of floats to represent numbers with high denominators. Since tensors only work with floats and integers, you can create two seperate tensors for representing numerators and denominators.