Closed zhf-0 closed 1 year ago
Hi! thanks for your contribution!, great first issue!
Hi @zhf-0,
There is a very good reason for the gradients being turned off. It happens in this part of the code:
https://github.com/Lightning-AI/torchmetrics/blob/f55cc58617a6ab520282004c977f1c676a195e7f/src/torchmetrics/functional/classification/ranking.py#L27-L33
We use torch.unique
to calculate the ranking of data, and that function does not support gradient calculations. You can try it out:
t = torch.tensor([1.0, 2.0, 1.0, 3.0, 1.0, 4.0, 2.0], requires_grad = True)
u = torch.unique(t)
print(u)
u.sum().backward()
this will not work. In general ranking is not a differential operator, and therefore implementing the ranking is some other way not using torch.unique
would still end up in a non-differential metric. There exist reseach that has come up with solutions for this, but not sure if it is out of scope here.
Hi @SkafteNicki,
Thank you very much for your explanation. The reason why I use MultilabelRankingAveragePrecision
as the loss function is that I read the paper Graph Neural Networks for Selection of Preconditioners and Krylov Solvers and I want to reproduce the results in the paper.
In section 4
GNN frameworks were based mainly on PyTorch [34] and PyTorch-Geometric [35], and evaluation
metrics were provided by TorchMetrics [36]
...
The evaluation metrics for multilabel classification in this paper include label ranking average
precision (LRAP)
Now, the funny thing is that the paper used this loss function to train the model, but you just said that it could not support gradient calculations. Do you have any idea how to use MultilabelRankingAveragePrecision
to train the model? or the paper is lying?
I have no idea how they actually optimize their networks, that is not proper specified in their paper. But from this (page 6):
Each linear solve was run in parallel using 64 MPI processes on the KNL partition on Theta [ 38], a Cray XC40 supercomputer at Argonne National Laboratory.
I assume they just used a default solver algorithm. But not my area of expertise.
I think however from section 4.2 that it is pretty clear that MultilabelRankingAveragePrecision
is just used for evaluation and not training, eg. evaluate result after model has been trained/optimized. I therefore do not think they rely on backpropergation through the metric.
I advice that you reach out to the authors to get access to their code, or at least a clarification of what is going on . Closing issue.
🐛 Bug
When I used
MultilabelRankingAveragePrecision
as the loss function, the program stopped and the error message wasAfter debugging, it turned out that the
MultilabelRankingAveragePrecision
would turn off the gradients automatically.To Reproduce
Use the example code from torchmetrics doc, but turn on the
requires_grad=True
ofpreds
Code sample
from torchmetrics.classification import MultilabelRankingAveragePrecision torch.manual_seed(42) preds = torch.rand(2, 5) pred.requires_grad=True target = torch.randint(2, (2, 5)) mlrap = MultilabelRankingAveragePrecision(num_labels=5) mlrap(preds, target)Expected behavior
The output of
mlrap(preds, target)
wastensor(0.9583)
withoutgrad_fn
. I also tried the function interface, the result was the same. However, when I tried MSE loss, the output waswith the
grad_fn
.Environment
conda
,pip
, build from source): usepip
to install,torchmetrics
version 0.11.4python
version 3.8.12,pytorch
version 1.9.1Linux
Additional context