HorizonRobotics / Sparse4D

MIT License
326 stars 31 forks source link

Determinism / Reproducibility #100

Open julien-seitz opened 1 month ago

julien-seitz commented 1 month ago

Hello, I just figured out that some part of the code does not seem to be deterministic. From my experience, pytorch seems to be quite deterministic if the random seed is set and if the input is identical.

I used the following function if the tensors have exactly the same hash value:

# https://stackoverflow.com/a/77212976
def hash_tensor(tensor):
    int_view = tensor.view(np.uint8)

    # https://stackoverflow.com/a/26782930
    int_view_cont = int_view.copy(order='C')

    return hashlib.sha1(int_view_cont).hexdigest()

The hash values seem to be identical for all tensors until the first call of the deformable aggregation function https://github.com/HorizonRobotics/Sparse4D/blob/main/projects/mmdet3d_plugin/ops/deformable_aggregation.py

Unfortunately I do not have experience in cuda programming. So far I do not have found the rootcause of the problem. Based on some questions on stackoverflow it might be related to the usage of AtomicAdd. https://github.com/HorizonRobotics/Sparse4D/blob/main/projects/mmdet3d_plugin/ops/src/deformable_aggregation.cpp

Do you have an idea what might be the reason for the non-deterministic effect in the deformable aggregation module?