Open vitalwarley opened 5 months ago
Hi! thanks for your contribution!, great first issue!
I think I found the problem. The returned thresholds are probabilities, because
preds (float tensor): (N, ...). Preds should be a tensor containing probabilities or logits for each observation. If preds has values outside [0,1] range we consider the input to be logits and will auto apply sigmoid per element.
So it makes sense. My fault... However, I didn't find it very clear at first.
thresholds: an 1d tensor of size (n_thresholds, ) with decreasing threshold values
So it makes sense. My fault... However, I didn't find it very clear at first.
Could you pls suggest how to clarify it in docs or examples?
Bug description
There's a noticeable difference in the calculated optimal thresholds when comparing the ROC curve implementations between
sklearn.metrics.roc_curve
andtorchmetrics.functional.roc
. Specifically, using the same input data for similarity scores and labels,sklearn
produces a significantly lower optimal threshold value compared totorchmetrics
.What version are you seeing the problem on?
v2.2
How to reproduce the bug
Error messages and logs
No response
Environment
Current environment
* CUDA: - GPU: - NVIDIA GeForce RTX 3070 Laptop GPU - available: True - version: 12.1 * Lightning: - lightning: 2.2.1 - lightning-utilities: 0.10.1 - pytorch-lightning: 2.2.1 - torch: 2.2.1 - torchmetrics: 1.3.1 - torchvision: 0.17.1 * Packages: - absl-py: 2.1.0 - aiohttp: 3.9.3 - aiosignal: 1.3.1 - asttokens: 2.4.1 - attrs: 23.2.0 - beautifulsoup4: 4.12.3 - certifi: 2024.2.2 - cfgv: 3.4.0 - chardet: 5.2.0 - charset-normalizer: 3.3.2 - click: 8.1.7 - contourpy: 1.2.0 - cycler: 0.12.1 - daemonize: 2.5.0 - debugpy: 1.8.1 - decorator: 5.1.1 - distlib: 0.3.8 - docstring-parser: 0.16 - executing: 2.0.1 - filelock: 3.13.1 - fonttools: 4.50.0 - frozenlist: 1.4.1 - fsspec: 2023.12.2 - gdown: 5.1.0 - grpcio: 1.62.1 - guildai: 0.9.0 - identify: 2.5.35 - idna: 3.6 - importlib-resources: 6.3.2 - ipython: 8.20.0 - jedi: 0.19.1 - jinja2: 3.1.3 - joblib: 1.3.2 - jsonargparse: 4.27.6 - kiwisolver: 1.4.5 - lightning: 2.2.1 - lightning-utilities: 0.10.1 - markdown: 3.6 - markupsafe: 2.1.3 - matplotlib: 3.8.3 - matplotlib-inline: 0.1.6 - mpmath: 1.3.0 - multidict: 6.0.5 - natsort: 8.4.0 - networkx: 3.2.1 - nodeenv: 1.8.0 - numpy: 1.26.4 - nvidia-cublas-cu12: 12.1.3.1 - nvidia-cuda-cupti-cu12: 12.1.105 - nvidia-cuda-nvrtc-cu12: 12.1.105 - nvidia-cuda-runtime-cu12: 12.1.105 - nvidia-cudnn-cu12: 8.9.2.26 - nvidia-cufft-cu12: 11.0.2.54 - nvidia-curand-cu12: 10.3.2.106 - nvidia-cusolver-cu12: 11.4.5.107 - nvidia-cusparse-cu12: 12.1.0.106 - nvidia-nccl-cu12: 2.19.3 - nvidia-nvjitlink-cu12: 12.3.101 - nvidia-nvtx-cu12: 12.1.105 - opencv-python: 4.9.0.80 - packaging: 24.0 - parso: 0.8.3 - pexpect: 4.9.0 - pillow: 10.2.0 - pip: 24.0 - pkginfo: 1.10.0 - platformdirs: 4.2.0 - pre-commit: 3.6.2 - prompt-toolkit: 3.0.43 - protobuf: 4.25.3 - psutil: 5.9.8 - ptyprocess: 0.7.0 - pure-eval: 0.2.2 - pygments: 2.17.2 - pyparsing: 3.1.2 - pysocks: 1.7.1 - python-dateutil: 2.9.0.post0 - pytorch-lightning: 2.2.1 - pyyaml: 6.0.1 - requests: 2.31.0 - scikit-learn: 1.4.1.post1 - scipy: 1.12.0 - setuptools: 69.0.3 - six: 1.16.0 - soupsieve: 2.5 - stack-data: 0.6.3 - sympy: 1.12 - tabview: 1.4.4 - tensorboard: 2.16.2 - tensorboard-data-server: 0.7.2 - threadpoolctl: 3.3.0 - torch: 2.2.1 - torchmetrics: 1.3.1 - torchvision: 0.17.1 - tqdm: 4.66.2 - traitlets: 5.14.1 - triton: 2.2.0 - typeshed-client: 2.5.1 - typing-extensions: 4.9.0 - urllib3: 2.2.1 - virtualenv: 20.25.1 - wcwidth: 0.2.13 - werkzeug: 3.0.1 - wheel: 0.42.0 - yarl: 1.9.4 * System: - OS: Linux - architecture: - 64bit - ELF - processor: - python: 3.11.8 - release: 6.7.9-arch1-1 - version: Lightning-AI/pytorch-lightning#1 SMP PREEMPT_DYNAMIC Fri, 08 Mar 2024 01:59:01 +0000More info
The output from
thresholds_
(usingsklearn
) andthresholds
(usingtorchmetrics
) reveals a significant difference in the threshold values range and granularity: