Closed spichardo closed 2 years ago
Hi Sam. Thanks for reporting the issue & sharing the test case to reproduce it.
I think I have a fix for this which may also help your code.
Can you test with the new v0.2.3 metalcompute packages, and confirm if the main issue is resolved for you?
The key change is here - to call setPurgeableState in order to mark the MTLBuffer as empty before releasing the reference
Excellent! Thanks for the prompt reply. I confirm v0.2.3 with the MTLPurgeableState addition corrected the issue. With this fix, I will be able to replace my own custom Swift modules with much simpler code using metalcompute. Later on, I may reach you to see if the Buffer function can add a bytesNoCopy
option to take advantage that now in the latest versions of Numpy we can control the allocation process to ensure the array is page-sized so we can avoid the copy process completely.
Happy new year 2022!
Hi,
Thanks for sharing this library. It defintively has a huge potential. I have an issue that I have experienced not only with metalcompute library but also in my own C-extensions of Metal through Swift for Python. It seems that the allocated buffers do not get released. I wonder if you have experienced this.
Below you can see a simplified code for the calculation of ultrasound fields using Metal. The function allocating and ( in principle) deallocating the buffers is called
ForwardPropagationMetal
. You can see it does very simular tasks as in the examples: create buffers, copy from numpy, run kernel and recover results. In this code, I made in purpose to call this function N-times. If you let it running it, you will see in the activityMonitor how the memory continues to grow and eventually you will end with aCouldNotMakeBuffer
error. To ru it, just save it an run it withpython <script>.py <substring of GPU> <Number of iterations>
, for example :python DemoMetalcomputeRunOutMemory.py 'M1' 100000
.This problems happens either in M1 or AMD GPUs. I tested this with my M1 Max Pro and with an external AMD PRO W6800 GPU via a thunderbolt3-connected enclosure with an iMac Pro. Both using latest Monterery and XCode versions. The Swift code is supposed to release the buffers but I haven't found why this is not occurring. As mentioned above, I experience the same using if using custom made Swift functions that encapsulate creation of buffers and calls to metal functions.
As a side note, OpenCL is still supported in Monterey and for M1 processors (something that was not supposed to be supported anymore, but I'm not really complaining). I can run similar code using pyopencl with no memory issues. But in pyopencl, the library controls the deallocation of OpenCL buffers as any other Python object. So I wonder if Swift deallocator is having some sort of blockage and if there is a way to force the deallocation once the call to Metal compute has been completed.
Thanks for any hint you could provide,
Happy 2022!
Sam