NVIDIA / MinkowskiEngine

Minkowski Engine is an auto-diff neural network library for high-dimensional sparse tensors
https://nvidia.github.io/MinkowskiEngine
Other
2.43k stars 360 forks source link

TensorField.sparse is not deterministic when data is on gpu #537

Open sysuyl opened 1 year ago

sysuyl commented 1 year ago

Describe the bug

When I use ME.TensorField to create input to do segmentation, I found the TensorField.sparse() result in randomness with same input on gpu. (but cpu seems ok).


To Reproduce

import torch
import numpy as np
import MinkowskiEngine as ME

def set_seed(seed):
    import torch
    import numpy as np
    import random
    import os

    torch.manual_seed(seed)
    torch.cuda.manual_seed_all(seed)
    torch.backends.cudnn.deterministic = True
    torch.backends.cudnn.benchmark = False
    np.random.seed(seed)
    random.seed(seed)
    os.environ['PYTHONHASHSEED'] = str(seed)

def compute_json_md5(json_obj):
    import json
    import hashlib
    json_str = json.dumps(json_obj)
    md5 = hashlib.md5(json_str.encode()).hexdigest()
    return md5

def create_tensor_filed_then_sparse(feat, coord, device):
    """
    Create a ME.TensorField from feat + coord, then call sparse() function on specific device.
    And compute input data and output data md5 value
    """
    # device = torch.device("cuda")
    # device = torch.device("cpu")

    a = torch.from_numpy(feat).to(device)
    b = torch.from_numpy(coord).to(device)

    input_data = {
        "f": a.cpu().numpy().tolist(),
        "c": b.cpu().numpy().tolist()
    }
    print("input data md5: ", compute_json_md5(input_data))

    in_field = ME.TensorField(
        features=a,
        coordinates=b,
        quantization_mode=ME.SparseTensorQuantizationMode.UNWEIGHTED_AVERAGE,
        minkowski_algorithm=ME.MinkowskiAlgorithm.SPEED_OPTIMIZED,
        device=device,
    )

    sinput = in_field.sparse()

    sinput_data = {
        "f": sinput.features.detach().cpu().numpy().tolist(),
        "c": sinput.coordinates.detach().cpu().numpy().tolist()
    }
    print("sinput md5: ", compute_json_md5(sinput_data))

def compare():
    feat = np.load("f.npy")
    coord = np.load("c.npy")

    set_seed(123)

    print("## device(cpu) ..")
    device = torch.device("cpu")
    print("run 1st ..")
    create_tensor_filed_then_sparse(feat, coord, device)
    print("run 2nd ..")
    create_tensor_filed_then_sparse(feat, coord, device)

    print("\n## device(cuda) ..")
    device = torch.device("cuda")
    print("run 1st ..")
    create_tensor_filed_then_sparse(feat, coord, device)
    print("run 2nd ..")
    create_tensor_filed_then_sparse(feat, coord, device)

if __name__ == "__main__":
    compare()

Expected behavior Actual output:

## device(cpu) ..
run 1st ..
input data md5:  1ff467de68cd7f6c81279fcc338a3cd3
sinput md5:  6f7cb28ee0df4ceda98fd87e25fd828d
run 2nd ..
input data md5:  1ff467de68cd7f6c81279fcc338a3cd3
sinput md5:  6f7cb28ee0df4ceda98fd87e25fd828d

## device(cuda) ..
run 1st ..
input data md5:  1ff467de68cd7f6c81279fcc338a3cd3
sinput md5:  a1637e0a876d3f6df738205f91205193
run 2nd ..
input data md5:  1ff467de68cd7f6c81279fcc338a3cd3
sinput md5:  4d6afe5a49335d174adaaacb5e95129b

Expect output: Randomness should not happen when sparse on cuda.


Desktop (please complete the following information):

wget -q https://raw.githubusercontent.com/NVIDIA/MinkowskiEngine/master/MinkowskiEngine/diagnostics.py ; python diagnostics.py
==========System==========
Linux-5.15.0-56-generic-x86_64-with-debian-bullseye-sid
DISTRIB_ID=Kylin
DISTRIB_RELEASE=V10
DISTRIB_CODENAME=kylin
DISTRIB_DESCRIPTION="Kylin V10 SP1"
DISTRIB_KYLIN_RELEASE=V10
DISTRIB_VERSION_TYPE=enterprise
DISTRIB_VERSION_MODE=normal
3.6.13 |Anaconda, Inc.| (default, Jun  4 2021, 14:25:59) 
[GCC 7.5.0]
==========Pytorch==========
1.8.2
torch.cuda.is_available(): True
==========NVIDIA-SMI==========
/usr/bin/nvidia-smi
Driver Version 510.108.03
CUDA Version 11.6
VBIOS Version 94.02.85.00.70
Image Version G001.0000.03.03
GSP Firmware Version N/A
==========NVCC==========
/usr/local/cuda-11.3/bin/nvcc
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2021 NVIDIA Corporation
Built on Mon_May__3_19:15:13_PDT_2021
Cuda compilation tools, release 11.3, V11.3.109
Build cuda_11.3.r11.3/compiler.29920130_0
==========CC==========
/usr/bin/c++
c++ (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0
Copyright (C) 2019 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

==========MinkowskiEngine==========
0.5.4
MinkowskiEngine compiled with CUDA Support: True
NVCC version MinkowskiEngine is compiled: 11030
CUDART version MinkowskiEngine is compiled: 11030

Additional context It may need to run more times to see the md5 difference when execution on gpu.

sysuyl commented 1 year ago

@chrischoy Can you check this problem, thanks.