Open SamuelBailey opened 2 years ago
The first screenshot shows RAM usage whilst performing model.compile()
. This is before the notebook had reached the model.run()
call. The notebook had previously been run, but the python kernel had been reset, so all memory from the previous run should have been freed. The log output in my bug report occurred after this call to model.compile()
The second screenshot shows random RAM spikes which occur even after closing the jupyter notebook. On a fresh boot these do not occur until after a jupyter notebook using tensorflow with rocm has been run.
Hi there, I am experiencing the same issue and log messages.
Issue Type
Bug
Source
binary
Tensorflow Version
tf 2.8.0
Custom Code
Yes
OS Platform and Distribution
Linux Ubuntu 20.04
Mobile device
N/A
Python version
3.7.13 (also experienced on 3.8.10)
Bazel version
N/A
GCC/Compiler version
N/A
CUDA/cuDNN version
ROCm 5.1.1 (using tensorflow installed from pip: tensorflow-rocm)
GPU model and memory
(not compiled from source, but Vega 56 with 8GB VRAM, and system has 16GB DRAM)
Current Behaviour?
When using the tensorflow on my GPU with ROCm, RAM usage randomly jumps upwards about 9 GiB for a few seconds then drops back down, even with a very small Neural Network model. This behaviour has been observed both when using the rocm/tensorflow docker image, or when running natively, and does not appear to have any reliance on the type of model being run. So long as
model.fit()
is called, RAM spikes begin to occur.These RAM spikes appear to be random, and once a call to
model.fit()
within tensorflow has been run, regardless of the contents or size of the model, the RAM spikes begin to occur. They reduce in frequency afterthe model.fit()
call has finished, however often do not completely stop until a system restart. These spikes occur particularly frequently when clicking on another window on the PC, as though (at a guess) the GPU needs to transfer a large chunk of memory in order to context switch.The code submitted (run via a jupyter notebook) causes the RAM spikes. When I restarted the python kernel and ran it a second time, the RAM spikes occurred simply when running model.compile, rather than having to wait until
model.fit()
, and HIP reported aHIP_ERROR_OutOfMemory
.Whilst RAM usage is high the rest of the computer becomes briefly almost completely unusable. I have experienced this issue when using a jupyter notebook in VSCode, and when using a jupyter notebook in a web browser.
I would expect RAM usage to very gradually climb during program execution, and not experience 2-3 second periods where a large number of gigabytes are used from RAM, then freed, causing the system to freeze.
Standalone code to reproduce the issue
Relevant log output