baldand / py-metal-compute

A python library to run metal compute kernels on macOS
MIT License
66 stars 11 forks source link

builtin type `metalcompute_device` has no `__module__` attribute #35

Open charlesbmi opened 7 months ago

charlesbmi commented 7 months ago

Thanks so much for this library! It is really useful.

This works great in a script, although I am getting some very slow performance when running this in a pytest. It's okay if that isn't on the roadmap, but my only hint for the potential issue is a warning that comes up when importing the module (only within pytest):

To reproduce, create file test_mc_import.py:

import metalcompute as mc

Run:

pytest test_mc_import.py

Example:

================================================================== test session starts ===================================================================
platform darwin -- Python 3.10.10, pytest-7.4.0, pluggy-1.0.0
rootdir: /Users/charles/Development/Mangrove
configfile: pyproject.toml
plugins: nbval-0.10.0, cov-4.1.0, anyio-3.7.1
collected 0 items                                                                                                                                        

==================================================================== warnings summary ====================================================================
<frozen importlib._bootstrap>:241
  <frozen importlib._bootstrap>:241: DeprecationWarning: builtin type metalcompute_device has no __module__ attribute

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
=================================================================== 1 warning in 0.01s ===================================================================
baldand commented 6 months ago

Thanks for the report.

Would you be able to provide a simple example that reproduces the slow performance with pytest?

Also, what version of macOS & is it Apple silicon or Intel?

(I have a fix for the __module__ warning, but I'm not sure if that could cause any performance issue)

In general the "setup" operations like device creation and kernel function compilation might not be so fast. If possible those should be done once before usage in a test fixture that is reused.

Calling an already compiled function multiple times with different data should then be quite fast.