Open chaithyagr opened 5 years ago
If compilation is the only thing that is done, there shouldn't be anything left after release()
(especially on the device - compilation doesn't allocate anything). Do you use CUDA or OpenCL (I'm guessing the latter, otherwise you'd get some errors from the context stack), and how exactly do you check the memory usage?
Also, just a note, release()
should happen automatically when thr
goes out of scope, so you rarely need to call it manually.
Do you use CUDA or OpenCL
In this case it is openCL. However, I think I have similar memory issues in CUDA.
how exactly do you check the memory usage?
Basically I watch the output of nvidia-smi and memory usage while I go through debug steps through my code.
Also, just a note, release() should happen automatically when thr goes out of scope, so you rarely need to call it manually.
This is exactly what I expected, but this did not happen. So i explictly called it at the del destructor. It was then that I noticed that memory is not cleared when a compile is called. Note that the clearing of memory is much cleaner once we just instance a thr
and not do any source code compilation with it
That's strange, I cannot reproduce it on Linux, Tesla P100, pyopencl 2018.2.5. Did you try the same test with just plain PyOpenCL? I cannot imagine what can be retained after release()
in Thread
itself.
Let me give it a try with plain pyOpenCL and get back. I do agree that this issue I am seeing is very weird. Perhaps I too will try to reproduce in a different settings. A note here that this was perfomed on a library that uses reikna (PyNUFFT) Initially, I found that even after deleting variables, the memory was not cleared. On debug, I came down to this basic issue.
Initially, I found that even after deleting variables, the memory was not cleared.
This may mean that there's a cyclic reference somewhere, either in Reikna or PyOpenCL. Did you try doing import gc; gc.collect()
?
Did you try doing import gc; gc.collect()?
Nope I did not, I was assuming this would just collect unused memory attached to RAM, would it help with memory on GPU too? I will give this a try too.
It may collect Python objects which hold handles to GPU memory.
A quick run of following code:
You would notice that the memory on device is cleared for non-compiled thread, but not for the compiled one