Coordinates ordering on CPU vs GPU

Describe the bug

Given the exact same code, I observed that the coordinates ordering from some_tensor.C can change depending on the device (CPU vs GPU). Is this expected?

To Reproduce This script is the smallest minimal example I could come up with:

import numpy as np
import MinkowskiEngine as ME
import torch

def reproduce(N, device):
    # Create x
    feats = torch.rand(N, 1).to(device)
    coords = torch.cat([torch.zeros((N, 1)), torch.rand(N, 3) * 100], dim=1).to(device)
    x = ME.SparseTensor(features=feats, coordinates=coords )

    # Create mask
    mask = (torch.rand(N, 6) > 0.5).float().to(device)
    mask = ME.SparseTensor(
        coordinates=x.C,
        features=mask,
        coordinate_manager=x.coordinate_manager,
        tensor_stride=x.tensor_stride,
    )

    # Create x0
    x0 = ME.SparseTensor(
        coordinates=x.C,
        features=torch.zeros(x.F.shape[0], mask.F.shape[1]).to(device),
        coordinate_manager=x.coordinate_manager,
        tensor_stride=x.tensor_stride
    )

    # print(x.C, mask.C, x0.C ) # These are all identical
    print('Do x, mask and x0 have all the same coordinates ordering? ', (x0.C == x.C).all() and (x0.C == mask.C).all())
    # No a priori reason but this set of coordinates is ordered differently on CPU, and identical to the previous one on GPU
    # print((mask + x0).C)
    print('Do mask + x0 and x0 have the same coordinates ordering? ', ((mask + x0).C == x0.C).all())

if __name__ == '__main__':
    print('Testing on CPU')
    reproduce(10, 'cpu')
    print('Testing on GPU')
    reproduce(10, 'cuda:0')

The output that I get is :

Testing on CPU
Do x, mask and x0 have all the same coordinates ordering?  tensor(True)
Do mask + x0 and x0 have the same coordinates ordering?  tensor(False)
Testing on GPU
Do x, mask and x0 have all the same coordinates ordering?  tensor(True, device='cuda:0')
Do mask + x0 and x0 have the same coordinates ordering?  tensor(True, device='cuda:0')

Expected behavior

If the coordinate ordering is going to change after a certain operation, I would expect the change to be consistent between CPU/GPU.
On GPU I never ever see the coordinate ordering change which is what I was initially expecting. This comment suggests that this behavior is however not guaranteed?

Desktop ==========System========== Linux-3.10.0-1160.42.2.el7.x86_64-x86_64-with-glibc2.29 DISTRIB_ID=Ubuntu DISTRIB_RELEASE=20.04 DISTRIB_CODENAME=focal DISTRIB_DESCRIPTION="Ubuntu 20.04.2 LTS" 3.8.10 (default, Jun 2 2021, 10:49:15) [GCC 9.4.0] ==========Pytorch========== 1.9.0+cu111 torch.cuda.is_available(): True ==========NVIDIA-SMI========== /usr/bin/nvidia-smi Driver Version 470.82.01 CUDA Version 11.4 VBIOS Version 90.02.30.40.85 Image Version G001.0000.02.04 GSP Firmware Version N/A ==========NVCC========== /usr/local/cuda/bin/nvcc nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2020 NVIDIA Corporation Built on Mon_Oct_12_20:09:46_PDT_2020 Cuda compilation tools, release 11.1, V11.1.105 Build cuda_11.1.TC455_06.29190527_0 ==========CC========== /usr/bin/c++ c++ (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0 Copyright (C) 2019 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

==========MinkowskiEngine========== 0.5.4 MinkowskiEngine compiled with CUDA Support: True NVCC version MinkowskiEngine is compiled: 11010 CUDART version MinkowskiEngine is compiled: 11010

Additional context We heavily rely on the coordinate ordering of some_tensor.C for various operations such as masking. This "bug" (feature?) currently prevents our code from working on CPU. This was not an issue in the past, but I have not pinpointed if a specific version of ME started this behavior.

Referencing our code's original issue here.

Thank you!!

NVIDIA / MinkowskiEngine

Coordinates ordering on CPU vs GPU #441