Closed bdzyubak closed 2 years ago
Mitigated for now by disabling GPU: tf.config.set_visible_devices([], 'GPU') b1bf7610c9d268133fb25e2457efd9f4b6fdc2ee
Note that tensorflow reserves all of the available GPU memory regardless of batch size. It then does or doesn't fit in it. First run with batch size of 16 (works), second with 32 (crashes):
Allocating less than maximum memory supposedly can be achieved with tf.config.experimental.set_memory_growth. Not tested.
Setting batch_size to 16 fixes the issue of GPU out of memory on all supported networks which now run on a gpu. d3a6e984c9c5353446fccfa8c5944880ed20cecd
Looking at calculating memory that will be needed by a network (may not be possible at data preparation alone, as number of network parameters matter,too) to add a batch size cap.
The following command can be used to check current memory use: tf.config.experimental.get_memory_info('GPU:0')
When a model is loaded into GPU memory, it immediately reserves all available memory (by default, all of GPU memory but can be limited). Loading a 364 MB UNET model shows that 12 GB of VRAM is now being used. With GPU disabled via tf.config.set_visible_devices([], 'GPU'), only 400 MB of RAM is used. Other tools are needed to determine actual memory being used on the GPU.
Meanwhile, investigated by disabling the GPU and looking at RAM consumption. The model itself is small. However, at training time a batch of 32 images of 25 KB each consumes up to max RAM. Will need to study literature further because I expect much less RAM use (forward prop gradients, back prop gradients, model weights are on the order of 3x more than the saved model on disk).
The amount of RAM use is non static. Some sources point out that tensorflow will optimize a few layers at a time, perhaps, limited by the available memory. On the GPU, this optimization does not seem to work, likely causing the OOM issues observed.
Overall, this bug is mitigated by trying to run the model, and setting batch size to the next lower power of 2, if running into an OOM issue (in the 12 GB GPU case - batch of 16 seems to work). The automated solution appears to be fairly complicated. May open that as a separate issue in the future, if I am going to need to support multiple machines with different configs.
Currently, the UNET segmentation code (possibly others) crashes when GPUs are discovered on the system https://github.com/bdzyubak/Deep-Learning-Sandbox/issues/25: OOM when allocating tensor with shape[32,64,256,256] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc [[{{node UNET/batch_normalization_16/FusedBatchNormV3}}]]
This is possibly caused by hard memory allocation to a fraction of system RAM in my code. It appears that an 'if GPU' statement is needed during allocation. This can be checked with
if tf.config.list_physical_devices('GPU')
The intended behavior is to either allow tensorflow to allocate automatically based on device config, or to check device config and allocate appropriately. All code should work with or without a GPU.