Lightning-AI / torchmetrics

Torchmetrics - Machine learning metrics for distributed, scalable PyTorch applications.
https://lightning.ai/docs/torchmetrics/
Apache License 2.0
2.09k stars 402 forks source link

MultilabelRankingAveragePrecision would turn off grad automatically #1677

Closed zhf-0 closed 1 year ago

zhf-0 commented 1 year ago

🐛 Bug

When I used MultilabelRankingAveragePrecision as the loss function, the program stopped and the error message was

RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn

After debugging, it turned out that the MultilabelRankingAveragePrecision would turn off the gradients automatically.

To Reproduce

Use the example code from torchmetrics doc, but turn on the requires_grad=True of preds

Code sample from torchmetrics.classification import MultilabelRankingAveragePrecision torch.manual_seed(42) preds = torch.rand(2, 5) pred.requires_grad=True target = torch.randint(2, (2, 5)) mlrap = MultilabelRankingAveragePrecision(num_labels=5) mlrap(preds, target)

Expected behavior

The output of mlrap(preds, target) was tensor(0.9583) without grad_fn. I also tried the function interface, the result was the same. However, when I tried MSE loss, the output was

>>>mse = torchmetrics.regression.MeanSquaredError()
>>>mse(preds,target)
tensor(0.2157, grad_fn=<SqueezeBackward0>)

with the grad_fn.

Environment

Additional context

github-actions[bot] commented 1 year ago

Hi! thanks for your contribution!, great first issue!

SkafteNicki commented 1 year ago

Hi @zhf-0, There is a very good reason for the gradients being turned off. It happens in this part of the code: https://github.com/Lightning-AI/torchmetrics/blob/f55cc58617a6ab520282004c977f1c676a195e7f/src/torchmetrics/functional/classification/ranking.py#L27-L33 We use torch.unique to calculate the ranking of data, and that function does not support gradient calculations. You can try it out:

t = torch.tensor([1.0, 2.0, 1.0, 3.0, 1.0, 4.0, 2.0], requires_grad = True)
u = torch.unique(t)
print(u)
u.sum().backward()

this will not work. In general ranking is not a differential operator, and therefore implementing the ranking is some other way not using torch.unique would still end up in a non-differential metric. There exist reseach that has come up with solutions for this, but not sure if it is out of scope here.

zhf-0 commented 1 year ago

Hi @SkafteNicki, Thank you very much for your explanation. The reason why I use MultilabelRankingAveragePrecision as the loss function is that I read the paper Graph Neural Networks for Selection of Preconditioners and Krylov Solvers and I want to reproduce the results in the paper.

In section 4

GNN frameworks were based mainly on PyTorch [34] and PyTorch-Geometric [35], and evaluation
metrics were provided by TorchMetrics [36]
...
The evaluation metrics for multilabel classification in this paper include label ranking average
precision (LRAP)

Now, the funny thing is that the paper used this loss function to train the model, but you just said that it could not support gradient calculations. Do you have any idea how to use MultilabelRankingAveragePrecision to train the model? or the paper is lying?

SkafteNicki commented 1 year ago

I have no idea how they actually optimize their networks, that is not proper specified in their paper. But from this (page 6):

Each linear solve was run in parallel using 64 MPI processes on the KNL partition on Theta [ 38], a Cray XC40 supercomputer at Argonne National Laboratory.

I assume they just used a default solver algorithm. But not my area of expertise.

I think however from section 4.2 that it is pretty clear that MultilabelRankingAveragePrecision is just used for evaluation and not training, eg. evaluate result after model has been trained/optimized. I therefore do not think they rely on backpropergation through the metric.

I advice that you reach out to the authors to get access to their code, or at least a clarification of what is going on . Closing issue.