IntelPython / dpctl

Python SYCL bindings and SYCL-based Python Array API library
https://intelpython.github.io/dpctl/
Apache License 2.0
97 stars 29 forks source link

Change memory object USM allocation ownership, and make execution asynchronous #1705

Closed oleksandr-pavlyk closed 3 weeks ago

oleksandr-pavlyk commented 4 weeks ago

This is PR changes the size and content of the struct behind dpctl.memory._Memory object.

This is backwards incompatible change, and would require rebuilding of downstream projects with native extensions (cython, or pybind11) that use dpctl.

The _Memory object no longer explicitly frees USM allocations it made, but delegates this job to a smart pointer. This allows host_task jobs that ensure deferment of USM deallocation till after offloaded tasks that operate on it complete execution to do their job by capturing the smart pointer in the callable passed to the host_task instead needing to rely on Python reference counting, and thus need to acquire GIL in that callable.

Furthermore, the PR introduces mechanism of ensuring sequential order amongst tasks implementing offloading Python API, such as dpctl.tensor implementing array API compliant tensor library.

One consequence of this change is that timing of dpctl.tensor functions must synchronize the execution queue before taking the final timestamp reading.

github-actions[bot] commented 4 weeks ago

Deleted rendered PR docs from intelpython.github.com/dpctl, latest should be updated shortly. :crossed_fingers:

github-actions[bot] commented 4 weeks ago

Array API standard conformance tests for dpctl=0.18.0dev0=py310h15de555_50 ran successfully. Passed: 890 Failed: 11 Skipped: 91

coveralls commented 4 weeks ago

Coverage Status

coverage: 88.057% (+0.1%) from 87.911% when pulling f4e3b6f1b635ec1f0fb83b3146cc57055b4f3aff on memory-work into c6f5f790b963a8984d3a7760cc9358aa4efe3d65 on master.

github-actions[bot] commented 4 weeks ago

Array API standard conformance tests for dpctl=0.18.0dev0=py310h15de555_51 ran successfully. Passed: 890 Failed: 11 Skipped: 91

github-actions[bot] commented 4 weeks ago

Array API standard conformance tests for dpctl=0.18.0dev0=py310h15de555_51 ran successfully. Passed: 890 Failed: 11 Skipped: 91

github-actions[bot] commented 4 weeks ago

Array API standard conformance tests for dpctl=0.18.0dev0=py310h15de555_53 ran successfully. Passed: 890 Failed: 11 Skipped: 91

github-actions[bot] commented 4 weeks ago

Array API standard conformance tests for dpctl=0.18.0dev0=py310h15de555_55 ran successfully. Passed: 890 Failed: 11 Skipped: 91

oleksandr-pavlyk commented 4 weeks ago

What's missing from this PR that must be addressed before merging:

oleksandr-pavlyk commented 3 weeks ago

I suggest we expand the documentation with explaining the asynchronous execution in a separate PR.

The important change to realize is that execution of tensor operations is done asynchronously, so to get a proper timing values, one must use queue.wait() before taking the final time stamp reading.

Submission preserves sequential ordering of operations as executed by Python interpreter.

github-actions[bot] commented 3 weeks ago

Array API standard conformance tests for dpctl=0.18.0dev0=py310h15de555_57 ran successfully. Passed: 890 Failed: 11 Skipped: 91

github-actions[bot] commented 3 weeks ago

Array API standard conformance tests for dpctl=0.18.0dev0=py310h15de555_57 ran successfully. Passed: 890 Failed: 11 Skipped: 91

github-actions[bot] commented 3 weeks ago

Array API standard conformance tests for dpctl=0.18.0dev0=py310h15de555_58 ran successfully. Passed: 890 Failed: 11 Skipped: 91

github-actions[bot] commented 3 weeks ago

Array API standard conformance tests for dpctl=0.18.0dev0=py310h15de555_59 ran successfully. Passed: 890 Failed: 11 Skipped: 91

github-actions[bot] commented 3 weeks ago

Array API standard conformance tests for dpctl=0.18.0dev0=py310h15de555_60 ran successfully. Passed: 889 Failed: 12 Skipped: 91