NVIDIA / MinkowskiEngine

Minkowski Engine is an auto-diff neural network library for high-dimensional sparse tensors
https://nvidia.github.io/MinkowskiEngine
Other
2.43k stars 360 forks source link

Minkowski NN optimization doesn't work #416

Open jkarolczak opened 2 years ago

jkarolczak commented 2 years ago

I have encountered a problem while implementing a single neuron network. I have declared two models. This one, using only PyTorch functionalities, works:

class NN(nn.Module):
    def __init__(self):
        nn.Module.__init__(self)
        self.linear = nn.Linear(1, 1)

    def forward(self, x: ME.SparseTensor):
        x = x.F.sum()
        x = self.linear(x.unsqueeze(0))
        return x

The one below, using MinkowskiEngine layer, doesn't:

class MinkowskiNN(ME.MinkowskiNetwork):
    def __init__(self):
        ME.MinkowskiNetwork.__init__(self, 3)
        self.linear = ME.MinkowskiLinear(
            in_features=1, 
            out_features=1,
            bias=False
        )
        self.pool = ME.MinkowskiGlobalSumPooling()

    def forward(self, x: ME.SparseTensor):
        x = self.linear(x)
        x = self.pool(x)
        return x.F.squeeze(0).squeeze(0)

By "doesn't work" I mean the only nn parameter is not optimizing, doesn't converge to 1 (which is an optimal solution). Here is my training loop:

criterion = torch.nn.L1Loss()
optimizer = torch.optim.Adam(
    model.parameters(),
    lr=1e-3
)
for epoch in epochs:
    for blob, y in dataloader:      
        optimizer.zero_grad()
        blob = to_minkowski_tensor(blob)
        y_hat = model(blob)
        loss = criterion(y, y_hat)
        loss.backward()
        optimizer.step()

where blob is a 3d mesh and y is the sum of all values in voxels (y=blob.sum()). I also attach the source of to_minkowski_tensor:

def to_minkowski_tensor(blob: torch.tensor) -> ME.SparseTensor: 
    coordinates = torch.nonzero(batch).int()
    features = []
    for idx in coordinates:
        features.append(blob[tuple(idx)])
    features = torch.tensor(features).unsqueeze(-1)
    coordinates, features = ME.utils.sparse_collate([coordinates], [features])
    return ME.SparseTensor(features=features, coordinates=coordinates)

This problem is trivial and the model is not optimizing at all, it's quite worrying. What's more, after a while loss become nan. Did I misunderstand MinkowskiEngine docs and I made a mistake somewhere in the code above? Please kindly notice, that PyTorch implementation works sufficiently good.

jkarolczak commented 2 years ago

I've found out, that MinkowskiEngine 0.54 doesn't work properly with PyTorch 1.9 - an issue from the description was observed by me using docker container created following your instructions. After installing ME (0.54) in a venv with PyTorch 1.10 the code (from this issue) works as intended. My colleague also claims that he also experienced the same problem while using ME (0.54) with PyTorch 1.7 and upgrading the PyTorch version to 1.10 solved his problems.