Given the exact same code, I observed that the coordinates ordering from some_tensor.C can change depending on the device (CPU vs GPU). Is this expected?
To Reproduce
This script is the smallest minimal example I could come up with:
import numpy as np
import MinkowskiEngine as ME
import torch
def reproduce(N, device):
# Create x
feats = torch.rand(N, 1).to(device)
coords = torch.cat([torch.zeros((N, 1)), torch.rand(N, 3) * 100], dim=1).to(device)
x = ME.SparseTensor(features=feats, coordinates=coords )
# Create mask
mask = (torch.rand(N, 6) > 0.5).float().to(device)
mask = ME.SparseTensor(
coordinates=x.C,
features=mask,
coordinate_manager=x.coordinate_manager,
tensor_stride=x.tensor_stride,
)
# Create x0
x0 = ME.SparseTensor(
coordinates=x.C,
features=torch.zeros(x.F.shape[0], mask.F.shape[1]).to(device),
coordinate_manager=x.coordinate_manager,
tensor_stride=x.tensor_stride
)
# print(x.C, mask.C, x0.C ) # These are all identical
print('Do x, mask and x0 have all the same coordinates ordering? ', (x0.C == x.C).all() and (x0.C == mask.C).all())
# No a priori reason but this set of coordinates is ordered differently on CPU, and identical to the previous one on GPU
# print((mask + x0).C)
print('Do mask + x0 and x0 have the same coordinates ordering? ', ((mask + x0).C == x0.C).all())
if __name__ == '__main__':
print('Testing on CPU')
reproduce(10, 'cpu')
print('Testing on GPU')
reproduce(10, 'cuda:0')
The output that I get is :
Testing on CPU
Do x, mask and x0 have all the same coordinates ordering? tensor(True)
Do mask + x0 and x0 have the same coordinates ordering? tensor(False)
Testing on GPU
Do x, mask and x0 have all the same coordinates ordering? tensor(True, device='cuda:0')
Do mask + x0 and x0 have the same coordinates ordering? tensor(True, device='cuda:0')
Expected behavior
If the coordinate ordering is going to change after a certain operation, I would expect the change to be consistent between CPU/GPU.
On GPU I never ever see the coordinate ordering change which is what I was initially expecting. This comment suggests that this behavior is however not guaranteed?
Desktop
==========System==========
Linux-3.10.0-1160.42.2.el7.x86_64-x86_64-with-glibc2.29
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=20.04
DISTRIB_CODENAME=focal
DISTRIB_DESCRIPTION="Ubuntu 20.04.2 LTS"
3.8.10 (default, Jun 2 2021, 10:49:15)
[GCC 9.4.0]
==========Pytorch==========
1.9.0+cu111
torch.cuda.is_available(): True
==========NVIDIA-SMI==========
/usr/bin/nvidia-smi
Driver Version 470.82.01
CUDA Version 11.4
VBIOS Version 90.02.30.40.85
Image Version G001.0000.02.04
GSP Firmware Version N/A
==========NVCC==========
/usr/local/cuda/bin/nvcc
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2020 NVIDIA Corporation
Built on Mon_Oct_12_20:09:46_PDT_2020
Cuda compilation tools, release 11.1, V11.1.105
Build cuda_11.1.TC455_06.29190527_0
==========CC==========
/usr/bin/c++
c++ (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0
Copyright (C) 2019 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
==========MinkowskiEngine==========
0.5.4
MinkowskiEngine compiled with CUDA Support: True
NVCC version MinkowskiEngine is compiled: 11010
CUDART version MinkowskiEngine is compiled: 11010
Additional context
We heavily rely on the coordinate ordering of some_tensor.C for various operations such as masking. This "bug" (feature?) currently prevents our code from working on CPU. This was not an issue in the past, but I have not pinpointed if a specific version of ME started this behavior.
Describe the bug
Given the exact same code, I observed that the coordinates ordering from
some_tensor.C
can change depending on the device (CPU vs GPU). Is this expected?To Reproduce This script is the smallest minimal example I could come up with:
The output that I get is :
Expected behavior
Desktop ==========System========== Linux-3.10.0-1160.42.2.el7.x86_64-x86_64-with-glibc2.29 DISTRIB_ID=Ubuntu DISTRIB_RELEASE=20.04 DISTRIB_CODENAME=focal DISTRIB_DESCRIPTION="Ubuntu 20.04.2 LTS" 3.8.10 (default, Jun 2 2021, 10:49:15) [GCC 9.4.0] ==========Pytorch========== 1.9.0+cu111 torch.cuda.is_available(): True ==========NVIDIA-SMI========== /usr/bin/nvidia-smi Driver Version 470.82.01 CUDA Version 11.4 VBIOS Version 90.02.30.40.85 Image Version G001.0000.02.04 GSP Firmware Version N/A ==========NVCC========== /usr/local/cuda/bin/nvcc nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2020 NVIDIA Corporation Built on Mon_Oct_12_20:09:46_PDT_2020 Cuda compilation tools, release 11.1, V11.1.105 Build cuda_11.1.TC455_06.29190527_0 ==========CC========== /usr/bin/c++ c++ (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0 Copyright (C) 2019 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
==========MinkowskiEngine========== 0.5.4 MinkowskiEngine compiled with CUDA Support: True NVCC version MinkowskiEngine is compiled: 11010 CUDART version MinkowskiEngine is compiled: 11010
Additional context We heavily rely on the coordinate ordering of
some_tensor.C
for various operations such as masking. This "bug" (feature?) currently prevents our code from working on CPU. This was not an issue in the past, but I have not pinpointed if a specific version of ME started this behavior.Referencing our code's original issue here.
Thank you!!