Closed AleRiccardi closed 1 year ago
Hi! thanks for your contribution!, great first issue!
Hi @AleRiccardi, Thanks for raising this issue. After taking a look at it (thanks for all the info you have provided) it is a tricky issue, which stems from the limitation that torchmetrics states by default only can be tensors or list of tensors. However, for this metric with `iou_type = "segm" we actually need list of tuples of tensors (so on extra layer of nested structure).
I can think of two solutions:
TensorTuple
that implements all the common methods like .cpu
, .cuda
etc. I added an example of what that could look like. Then we replace the call to tuple(masks)
with TensorTuple(masks)
and everything should somewhat work (I think something still need to be changed for ddp to work).
from typing import Callable, Sequence, TypeVar, Optional, Union
from torch.nn import Module
import torch
from torch import Tensor, device, dtype
T = TypeVar('T', bound='TensorTuple')
class TensorTuple(tuple):
def _apply(self, fn: Callable) -> Module:
vals = [ ]
for val in self:
vals.append(fn(val))
return TensorTuple(vals)
def cuda(self: T, device: Optional[Union[int, device]] = None) -> T:
return self._apply(lambda t: t.cuda(device))
def ipu(self: T, device: Optional[Union[int, device]] = None) -> T:
return self._apply(lambda t: t.ipu(device))
def xpu(self: T, device: Optional[Union[int, device]] = None) -> T:
return self._apply(lambda t: t.xpu(device))
def cpu(self: T) -> T:
return self._apply(lambda t: t.cpu())
def type(self: T, dst_type: Union[dtype, str]) -> T:
return self._apply(lambda t: t.type(dst_type))
def float(self: T) -> T:
return self._apply(lambda t: t.float() if t.is_floating_point() else t)
def double(self: T) -> T:
return self._apply(lambda t: t.double() if t.is_floating_point() else t)
def half(self: T) -> T:
return self._apply(lambda t: t.half() if t.is_floating_point() else t)
def bfloat16(self: T) -> T:
return self._apply(lambda t: t.bfloat16() if t.is_floating_point() else t)
def to_empty(self: T, *, device: Union[str, device]) -> T:
return self._apply(lambda t: torch.empty_like(t, device=device))
@justusschock what do you think we should do?
@SkafteNicki I would not used 2. as it is easy to break something we are not aware of that way.
I suggest introducing https://github.com/Lightning-AI/utilities as a dependency and rely on https://github.com/Lightning-AI/utilities/blob/main/src/lightning_utilities/core/apply_func.py for this case (similar to what PL does in several cases). This way you could nest how deep you wish and then use apply_to_collection
with dtype=torch.Tensor
to map this to all levels of nested collectives.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
would be great to have a solution here?
The problem rises when we instantiate the MeanAvaragePrecision class of type segm, call the
update(...)
method, and finally thecpu()
method. I personally have no reason for calling thecpu()
method, but pythorch lightning does. At the end of a training, it tries to place every inner module on the CPU (you can find the full traceback of the error at the bottom of this issue which proves what I am saying). This triggers theself._apply(...)
method in the metrics.py file of this package, which rises the following error:AttributeError: 'tuple' object has no attribute 'cpu'
.The reason this is happening is that every time we update the metrics we call the following method:
Source
This changes the mask type from Tensor to Tuple and then update the
self.detections
list in the below code:Source
Finally, when the
_apply(...)
method is called, it tries to move every element of theself.detections
list to the CPU device. But because every element is a Tuple it raises the mentioned error. In fact, a tuple does not implement thecpu()
method.Source
The
fn(...)
method is declared here:Source
Full error traceback: