Minkowski NN optimization doesn't work

I have encountered a problem while implementing a single neuron network. I have declared two models. This one, using only PyTorch functionalities, works:

class NN(nn.Module):
    def __init__(self):
        nn.Module.__init__(self)
        self.linear = nn.Linear(1, 1)

    def forward(self, x: ME.SparseTensor):
        x = x.F.sum()
        x = self.linear(x.unsqueeze(0))
        return x

The one below, using MinkowskiEngine layer, doesn't:

class MinkowskiNN(ME.MinkowskiNetwork):
    def __init__(self):
        ME.MinkowskiNetwork.__init__(self, 3)
        self.linear = ME.MinkowskiLinear(
            in_features=1, 
            out_features=1,
            bias=False
        )
        self.pool = ME.MinkowskiGlobalSumPooling()

    def forward(self, x: ME.SparseTensor):
        x = self.linear(x)
        x = self.pool(x)
        return x.F.squeeze(0).squeeze(0)

By "doesn't work" I mean the only nn parameter is not optimizing, doesn't converge to 1 (which is an optimal solution). Here is my training loop:

criterion = torch.nn.L1Loss()
optimizer = torch.optim.Adam(
    model.parameters(),
    lr=1e-3
)
for epoch in epochs:
    for blob, y in dataloader:      
        optimizer.zero_grad()
        blob = to_minkowski_tensor(blob)
        y_hat = model(blob)
        loss = criterion(y, y_hat)
        loss.backward()
        optimizer.step()

where blob is a 3d mesh and y is the sum of all values in voxels (y=blob.sum()). I also attach the source of to_minkowski_tensor:

def to_minkowski_tensor(blob: torch.tensor) -> ME.SparseTensor: 
    coordinates = torch.nonzero(batch).int()
    features = []
    for idx in coordinates:
        features.append(blob[tuple(idx)])
    features = torch.tensor(features).unsqueeze(-1)
    coordinates, features = ME.utils.sparse_collate([coordinates], [features])
    return ME.SparseTensor(features=features, coordinates=coordinates)

This problem is trivial and the model is not optimizing at all, it's quite worrying. What's more, after a while loss become nan. Did I misunderstand MinkowskiEngine docs and I made a mistake somewhere in the code above? Please kindly notice, that PyTorch implementation works sufficiently good.

NVIDIA / MinkowskiEngine

Minkowski NN optimization doesn't work #416