Closed GF-Huang closed 3 years ago
Does it means the RAM still not enough?
The short answer is that the neural net you are trying to train requires more than your T4's 16GB of memory to train at some point in its processing.
The long answer is:
To free up GPU memory, Large Model Support swaps out "inactive" tensors which are not needed for the current operation or otherwise tagged as active by a current TensorFlow execution context.
I will use snippets from the log file above to describe what is happening in your specific case.
TensorFlow is requesting 7.92GiB of memory for something, likely a tensor:
2020-12-11 15:33:27.426434: W tensorflow/core/common_runtime/bfc_allocator.cc:685] Allocator (GPU_0_bfc) ran out of memory trying to allocate 7.92GiB (rounded to 8500515328)
At this point your GPU currently has ~8.5GiB in use and those memory chunks are marked as "active", required to reside on the GPU and ineligible to swap to system memory:
InUse: 8500841728
...
BytesActive: 8500841728
LMS at this point LMS currently has swapped out about 25GB to your system memory:
CurBytesReclaimed: 25501089536
but unfortunately there are no in-active tensor bytes to swap out to make room for the 7.92 GiB request:
BytesInactive: 0
So, my model can not be trained even if I have a large RAM (52 GB)?
Well 52G is actually relatively small for this type of work. We typically recommend 64-128G minimum in systems running TensorFlow. When using LMS you can consume much more as shown above.
The limitation here is not the 52GB of system memory, but rather the 16GB of GPU memory. LMS will move inactive tensors to system memory, that is tensors that are not required for the operation that it is going to be run. Ultimately what is active vs inactive, and thus eligible to be swapped out is determined by operation context scope in TensorFlow.
In this case, the base amount of memory required for whatever option is about to run is greater than the GPU max.
To say it another way, for a given operation on the GPU, there must be enough memory to allow both the inputs and outputs of the operation to reside on the GPU.
In this case, the operation is requiring more memory than the GPU has.
At the point in time this failed, LMS has already swapped 25GB to system memory to free up space for the operations that preceded this one, but it still comes down to requiring enough memory to hold both the inputs and outputs of the operation.
Does that mean that even if I increase the machine's RAM (NOT GPU memory) to higher, it won't help?
Ah, correct that's likely the problem here. The GPU will always need enough GPU memory to complete a single operation to generate the resultant tensor, LMS doesn't break down operations to ensure they fit - it just swaps resultant tensors main memory so there is enough space to work with bigger ones.
Got it. Thanks all.
Machine:
8 vCPU 52 GB RAM
+NVIDIA Tesla T4 16 GB
Jupyter-Lab logs: