Hi, I am trying to run the model on a CIFAR100 dataset. I am getting the following error. I have 4 Tesla V100 GPUs.
2022-08-05 10:00:26.833817: W external/org_tensorflow/tensorflow/core/common_runtime/bfc_allocator.cc:479] Allocator (GPU_0_bfc) ran out of memory trying to allocate 9.00MiB (rounded to 9437184)requested by op
2022-08-05 10:00:26.835182: W external/org_tensorflow/tensorflow/core/common_runtime/bfc_allocator.cc:491] *********************************************************************************x**************x***
2022-08-05 10:00:26.835281: E external/org_tensorflow/tensorflow/compiler/xla/pjrt/pjrt_stream_executor_client.cc:2130] Execution of replica 0 failed: RESOURCE_EXHAUSTED: Out of memory while trying to allocate 9437184 bytes.
BufferAssignment OOM Debugging.
The complete running logs can be found here. Please help me with solving the issue.
===============
For your information, I was getting a RuntimeError: Visible devices cannot be modified after being initialized error. Hence, I added the following code snippet in main.py from https://www.tensorflow.org/guide/gpu, and it solved the issue.
"""Main file for running the example."""
import os
os.environ["XLA_PYTHON_CLIENT_PREALLOCATE"] = "false"
imports ...
FLAGS = flags.FLAGS
...
def main(argv):
del argv
# Hide any GPUs form TensorFlow. Otherwise TF might reserve memory and make
# it unavailable to JAX.
# tf.config.experimental.set_visible_devices([], "GPU")
gpus = tf.config.list_physical_devices('GPU')
if gpus:
# Restrict TensorFlow to only use the first GPU
try:
tf.config.experimental.set_visible_devices(gpus[0], 'GPU')
logical_gpus = tf.config.list_logical_devices('GPU')
print(len(gpus), "Physical GPUs,", len(logical_gpus), "Logical GPU")
except RuntimeError as e:
# Visible devices must be set before GPUs have been initialized
print(e)
# if gpus:
# # Create 2 virtual GPUs with 1GB memory each
# try:
# tf.config.set_logical_device_configuration(
# gpus[0],
# [tf.config.LogicalDeviceConfiguration(memory_limit=1024),
# tf.config.LogicalDeviceConfiguration(memory_limit=1024)])
# logical_gpus = tf.config.list_logical_devices('GPU')
# print(len(gpus), "Physical GPU,", len(logical_gpus), "Logical GPUs")
# except RuntimeError as e:
# # Virtual devices must be set before GPUs have been initialized
# print(e)
if FLAGS.exp_id:
...
Hi, I am trying to run the model on a CIFAR100 dataset. I am getting the following error. I have 4 Tesla V100 GPUs.
The complete running logs can be found here. Please help me with solving the issue.
===============
For your information, I was getting a
RuntimeError: Visible devices cannot be modified after being initialized
error. Hence, I added the following code snippet inmain.py
from https://www.tensorflow.org/guide/gpu, and it solved the issue.