Unified Memory Coordination

baldand / py-metal-compute

A python library to run metal compute kernels on macOS

MIT License

72 stars 11 forks source link

Thank you for this amazing work!

I used Numba CUDA to enable parallel computing for my transportation-related simulations, and I found that the memory copy time was significant compared with processing time. I want to test if the unified memory of Apple Silicon can do the job faster and I have an M1 Max MacBook Pro. From the examples, I found that the data was created in python array (I assume it's CPU?) first and then passed into metal kernel. I'm wondering if unified memory automatically does the magic, or does it still require a virtual memory copy operation somewhere? Is there a way I can measure the copy time and the processing time? Or is it technically ~0 second to copy between CPU and GPU?

baldand / py-metal-compute

Unified Memory Coordination #38