More segment ops need to be patched

llan-ml commented 3 years ago

Following #25, I tested more segment ops as in this colab notebook.

In conclusion, for unsorted segment ops, tf.math.unsorted_segment_(mean,prod,sqrt_n,sum) need to be patched, and for sorted segment ops, only tf.math.segment_sum needs to be patched.

duncanriach commented 3 years ago

Hi @llan-ml, This is some really nice work. Thank you. (I have read the code but not yet run it.)

The next release (0.4.0) will patch tf.math.unsorted_segment_sum and tf.math.segment_sum.

Based on what you've provided, a future release will additionally patch:

tf.math.unsorted_segment_mean
tf.math.unsorted_segment_prod
tf.math.unsorted_segment_sqrt_n

I will also update the documentation to reflect these findings next week.

Please will you let me know your full name so that I can list it in the credits for this repository?

llan-ml commented 3 years ago

@duncanriach Thanks for your efforts in this work. My name is Lin Lan.

duncanriach commented 3 years ago

Update: I have added your name to the credits and documented the findings with this commit. I have also created a task to repro your findings and confirm that it's not possible to squeeze some nondeterminism out of those other ops with different parameter configurations. Once that's done, we may release a patch if we have not already fixed these ops at the CUDA level.

duncanriach commented 3 years ago

Hi @llan-ml,

Some updates:

@wenscarl and I have reproduced your findings.
@wenscarl discovered/realized that if any of the data values in the product are zero, then the result will be zero, which will mask nondeterminism. Therefore, when reproducing nondeterminism in product ops, it's important that the data is never zero.
Stock TensorFlow version 2.5 will include d9m-unimplemented exceptions (which can be disabled to support the patch) on all these nondeterministic ops (see PR 47772).
Based on my current understanding of the tf.math.segment_prod implementation, it should, under some circumstances, operate nondeterministically in the forward direction. Unfortunately, so far, I have been unable to reproduce this nondeterminism.
In the course of attempting to repro that nondeterminism, I discovered/realized that it's necessary for the data to be randomized very close to 1.0 (e.g. np.random.random_sample(shape) * 0.00001 + 1.0) otherwise a large reduction will end up clamped at either inf or 0.0, both case masking any nondeterminism that was introduced.
When comparing sums of elements, note that different results can produce the same sum. This potential aliasing issue can be avoided completely by comparing all the elements of the tensor between runs (@wenscarl suggests tf.reduce_all(tf.math.equal(result_a, result_b))); that produces a match/mismatch result per run rather than a signature or digest. Alternatively, though not perfect and still subject to aliasing, a hash of a tensor (or list of tensors) can be used to test for reproducibility. In the past, I have recommended storing the sum over all trainable variable to a file in order to test that a whole model can run reproducibly. I now prefer to write out a hash of all trainable variables.

duncanriach commented 3 years ago

And closing this issue.

NVIDIA / framework-reproducibility

More segment ops need to be patched #31