RaduAlexandru / permutohedral_encoding

Blazingly fast encoding for neural networks based on permutohedral lattices
MIT License
94 stars 10 forks source link

Non determinism when updating the hash encoding #4

Open anonymous-pusher opened 12 months ago

anonymous-pusher commented 12 months ago

Hello and thank you for the great work. I tried this encoding with an implementation of instant ngp as well as in permutosdf and I noticed that it is not possible to reproduce the same training behavior when setting all the seeds, I use the following:

    os.environ["CUBLAS_WORKSPACE_CONFIG"] = ":16:8"
    os.environ["PYTHONHASHSEED"] = str(seed_number)

    torch.backends.cudnn.deterministic = True
    torch.backends.cudnn.benchmark = False
    random.seed(seed_number)
    torch.cuda.manual_seed_all(seed_number)
    torch.use_deterministic_algorithms(True)
    torch.manual_seed(seed_number)
    torch.cuda.manual_seed(seed_number)
    np.random.seed(seed_number)

The update of the hash tables after a backward pass gives slightly different values for the embeddings. Freezing only the hash encoding leads to 100% reproducibility, since the rest of the components are in pytorch. Although the differences are small, they accumulate in the course of training and lead to different results. It also makes it hard to assess if different regularizations are contributing to the final results, or it's just the randomness in the update that leads to the different numbers.

Is there a way to make the update deterministic in the same way how it's possible to be done in pytorch ? I also found a similar issue when using the ingp voxel hash encoding, so it is probably something to do with the cuda implementation.

Thank you

RaduAlexandru commented 11 months ago

Hi there @mJones00

Indeed the non-determinism during training is difficult to avoid. The issue is that the backward pass uses an atomicAdd() inside the cuda kernels. Depending how the GPU threads are scheduled this might lead to slightly different results and currently there is no way to avoid this.

Determinism could possibly be achieved by using some sort of gather when doing the backward pass instead of a scatter operation but that would require heavy rewriting of the cuda kernels.

Sorry if I can't provide a better solution currently.