baldand / py-metal-compute

A python library to run metal compute kernels on macOS
MIT License
72 stars 11 forks source link

Unified Memory Coordination #38

Open Crear12 opened 7 months ago

Crear12 commented 7 months ago

Thank you for this amazing work!

I used Numba CUDA to enable parallel computing for my transportation-related simulations, and I found that the memory copy time was significant compared with processing time. I want to test if the unified memory of Apple Silicon can do the job faster and I have an M1 Max MacBook Pro. From the examples, I found that the data was created in python array (I assume it's CPU?) first and then passed into metal kernel. I'm wondering if unified memory automatically does the magic, or does it still require a virtual memory copy operation somewhere? Is there a way I can measure the copy time and the processing time? Or is it technically ~0 second to copy between CPU and GPU?

baldand commented 5 months ago

If you use the buffer object supplied by the library to store your data in, then it should be possible to share between CPU & GPU without any copies.

However, if your data is initially in a normal numpy array or other buffer type, then it will need to be copied into a unified memory buffer first before it can be accessed by the GPU.