NVIDIA / tensorflow

An Open Source Machine Learning Framework for Everyone
https://developer.nvidia.com/deep-learning-frameworks
Apache License 2.0
962 stars 144 forks source link

the same code and configuration, nvidia-tensorflow gpu card OOM when reuse=True on A30. but tensorflow 1.14 work OK on T4. #68

Closed BingWin789 closed 2 years ago

BingWin789 commented 2 years ago

Please make sure that this is a bug. As per our GitHub Policy, we only address code/doc bugs, performance issues, feature requests and build/installation issues on GitHub. tag:bug_template

System information

You can collect some of this information using our environment capture script You can also obtain the TensorFlow version with: 1. TF 1.0: python -c "import tensorflow as tf; print(tf.GIT_VERSION, tf.VERSION)" 2. TF 2.0: python -c "import tensorflow as tf; print(tf.version.GIT_VERSION, tf.version.VERSION)"

Describe the current behavior I used two datasets to train my model alternatively, the model share common weights. Like this:

with tf.variable_scope('mymodel', reuse=False):
    pred1 = model(dataset1)
with tf.variable_scope('mymodel', reuse=True):
    pred2 = model(dataset2)

Describe the expected behavior I used batchsize of 12 to train my model. Tensorflow works OK on T4, but Nvidia-Tensorflow gpu card OOM on A30.

Code to reproduce the issue Provide a reproducible test case that is the bare minimum necessary to generate the problem.

Other info / logs Include any logs or source code that would be helpful to diagnose the problem. If including tracebacks, please include the full traceback. Large logs and files should be attached.