Control the GPU memory allocation

carlpe commented 5 years ago

Tensorflow-gpu 1.13.1

I am using densevnet. Some days ago everything was working properly, but today I got an error message 'OOM....' .

My total GPU memory is: 2019-08-24 16:39:21.609144: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 0 with properties: name: GeForce RTX 2080 Ti major: 7 minor: 5 memoryClockRate(GHz): 1.545 pciBusID: 0000:0a:00.0 totalMemory: 11.00GiB freeMemory: 8.95GiB 2019-08-24 16:39:21.834574: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 1 with properties: name: GeForce RTX 2080 Ti major: 7 minor: 5 memoryClockRate(GHz): 1.545 pciBusID: 0000:41:00.0 totalMemory: 11.00GiB freeMemory: 8.95GiB

And it seems to be allocating the following: 2019-08-24 16:41:50.719865: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 8620 MB memory) -> physical GPU (device: 0, name: GeForce RTX 2080 Ti, pci bus id: 0000:0a:00.0, compute capability: 7.5) 2019-08-24 16:41:50.727263: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:1 with 8620 MB memory) -> physical GPU (device: 1, name: GeForce RTX 2080 Ti, pci bus id: 0000:41:00.0, compute capability: 7.5)

I want to control the GPU memory allocation. By default it is pre-allocated within tensorflow. From the below guide I read that I am ought to either enable "allow_growth" or to change the percentage of memory pre-allocated, using per_process_gpu_memory_fraction config option.

https://riptutorial.com/tensorflow/example/31879/control-the-gpu-memory-allocation

Do I add the following code to dense_vnet.py?

config = tf.ConfigProto() config.gpu_options.allow_growth = True sess= tf.Session(config=config)

How do I do this? When I try this, nothing seems to happen. (I have already reduced batch_size = 1).

ericspod commented 5 years ago

There isn't a way currently to set this option unfortunately through config files. The config object used is defined here so you could add the config.gpu_options.allow_growth = True line below that. We should look into a better way of doing this.

carlpe commented 5 years ago

There isn't a way currently to set this option unfortunately through config files. The config object used is defined here so you could add the config.gpu_options.allow_growth = True line below that. We should look into a better way of doing this.

I tried doing this, unfortunately - it did not work. Do you think adding another GPU will sort this issue?

ericspod commented 5 years ago

I think you're trying to load too much into memory. Tensorflow will allocate all the memory a card has then actually use it as it creates tensors, I think you're getting the OOM because you're trying to use more memory than your card has, that is to say more than Tensorflow has pre-allocated to itself. If you can try on a card with more memory you could see if that helps.

carlpe commented 5 years ago

I think you're trying to load too much into memory. Tensorflow will allocate all the memory a card has then actually use it as it creates tensors, I think you're getting the OOM because you're trying to use more memory than your card has, that is to say more than Tensorflow has pre-allocated to itself. If you can try on a card with more memory you could see if that helps.

Ok, thank you for you reply.

By the way; with lower resolution settings it is working very well.

However I have another question also.. If I set the input resolution very high, there is a 2GB limit in tensorflow protocol buffer.

Please see my question formulated on stackoverflow; https://stackoverflow.com/questions/55490873/cannot-serialize-protocol-buffer-of-type-tensorflow-graphdef-as-the-serialized-s

Is there a way to change this in Niftynet? To play around this 2GB limit?

zheng-xing commented 4 years ago

Adding "config.gpu_options.allow_growth = True" to the "tf_config" function does not work for me. I have to use the method mentioned here.

patricio-astudillo commented 4 years ago

Adding "config.gpu_options.allow_growth = True" to the "tf_config" function does not work for me. I have to use the method mentioned here.

This does not work for me either, however the following worked for me: export TF_FORCE_GPU_ALLOW_GROWTH=true

NifTK / NiftyNet

Control the GPU memory allocation #435