NVIDIA / framework-reproducibility

Providing reproducibility in deep learning frameworks
Apache License 2.0
423 stars 40 forks source link

patch tf.image.resize with bilinear by casting image to tf.float64 #32

Closed llan-ml closed 3 years ago

llan-ml commented 3 years ago

@duncanriach

For tensorflow<2.4, we can patch tf.image.resize by casting image to tf.float64, like patching segment ops. See some tests here.

It might be better to patch the ops located in tensorflow.python.ops.gen_image_ops, which needs further tests.

In addition, it seems that tf.image.resize with NEAREST_NEIGHBOR on GPU does not introduce non-determinism during backprop.

duncanriach commented 3 years ago

Hi @llan-ml, sorry for the delay in getting back to you.

Unfortunately, the super-sampling approach to addressing nondeterminism is not robust and we cannot rely on it. See this discussion for more info. For this reason, the dynamic patch for the segment reduction ops has not been released and we cannot use it in other patches. We are in the process of adding robust GPU-deterministic segment reduction ops to stock TensorFlow using a different approach.

I am about to start working on adding GPU-deterministic operation to tf.image.resize with NEAREST_NEIGHBOR and will first attempt to confirm your apparent discovery that it is already deterministic.

Closing this issue for now.

duncanriach commented 3 years ago

Update: I temporarily modified the determinism test for tf.image.resize with method='bilinear' and confirmed that method='nearest' does, indeed, introduce nondeterminism in the backprop. I don't know why the test in your notebook does not exercise this nondeterminism, @llan-ml.