NVIDIA / framework-reproducibility

Providing reproducibility in deep learning frameworks
Apache License 2.0
423 stars 40 forks source link

Make tf_determinism.patch() handle tf>=2.1 as well #20

Closed MFreidank closed 4 years ago

MFreidank commented 4 years ago

Hi @duncanriach, I was wondering if it would not make sense for tf_determinism.patch() to handle tensorflow>=2.2 as well.

My viewpoint is this: maybe the interface that would be easiest to grasp for this package could be that it renders the current version of tensorflow deterministic (as deterministic as possible) on GPU's if there is any hope to do so (tensorflow >=1.14).

Following that logic, for recent tf versions (>=2.1) the following implementation of patch would make sense:

def patch():
    ...
    os.environ["TF_DETERMINISTIC_OPS"] = "1"

Otherwise, it forces users that want to support multiple versions of tf2 to write code of the sort:

import tensorflow
from tensorflow_determinism import patch
try:
    patch()
except TypeError:  # tf>= 2.2
    os.environ["TF_DETERMINISTIC_OPS"] = "1"

vs

import tensorflow
from tensorflow-determinism import patch
# only gives an error for tf <= 1.13 for which it is not possible to be deterministic on GPU's
patch()

which covers tensorflow >=1.14 fine. I understand the need to complain in certain scenarios (tensorflow < 1.14) but I feel that supporting more recent versions would be worthwhile and easier on users.

What do you think?

duncanriach commented 4 years ago

This is a great point, and I agree with you. I've been wanting to do this for a while. However, I decided that patch should actually just patch, because that's the historically expected functionality.

I've been planning to add another function called enable_determinism that patches if necessary and sets any state for stock TensorFlow that might be necessary, such as setting TF_DETERMINISTIC_OPS=1 or tf.config.experimental.deterministic_ops=1 instead (in the future). Also, if given a seed, it can set the various seeds. It could also set HOROVOD_FUSION_THRESHOLD=0.

The reason I want to do it like this is because enabling determinism often doesn't involve patching, or it involves more than just patching.

I've been planning to do this for a while, but I just keep getting side-tracked into other, seemingly more urgent, issues. Perhaps this should be prioritized to go into version 0.4 along with the fused softmax/cross-entropy patch that you're working on.

I would prefer that someone using this package can just call tfdeterminism.enable_determinism() and never have to think about it again.

duncanriach commented 4 years ago

... and, while reviewing your PR, I notice that there is a wrinkle in this argument in that the current patch function sets TF_CUDNN_DETERMINISTIC=1 to select the deterministic cuDNN functionality. So, before TensorFlow 2.1 (which TF_DETERMINISTIC_OPS was released), patch did do everything necessary, as far as ops were concerned, to make the TensorFlow operate deterministically.

I'm thinking it might make most sense to deprecate patch as part of the API (for version 0.4). Up until stock TensorFlow version 2.0, it will function as it currently does, but with a printed deprecation warning (recommending a switch to enable_determinism). Then in any other version of TensorFlow, it will throw an exception (as it currently does) and inform the user to use enable_determinism.

I'll try to scaffold this for you so that your PR changes can be leveraged in the context of enable_determinism ...

duncanriach commented 4 years ago

Also, take a look at this comment on an earlier issue.

MFreidank commented 4 years ago

Closing this as per your points, enable_determinism is a perfect solution to what I outlined above.