Closed oleksandr-pavlyk closed 3 weeks ago
Deleted rendered PR docs from intelpython.github.com/dpctl, latest should be updated shortly. :crossed_fingers:
Array API standard conformance tests for dpctl=0.18.0dev0=py310h15de555_50 ran successfully. Passed: 890 Failed: 11 Skipped: 91
Array API standard conformance tests for dpctl=0.18.0dev0=py310h15de555_51 ran successfully. Passed: 890 Failed: 11 Skipped: 91
Array API standard conformance tests for dpctl=0.18.0dev0=py310h15de555_51 ran successfully. Passed: 890 Failed: 11 Skipped: 91
Array API standard conformance tests for dpctl=0.18.0dev0=py310h15de555_53 ran successfully. Passed: 890 Failed: 11 Skipped: 91
Array API standard conformance tests for dpctl=0.18.0dev0=py310h15de555_55 ran successfully. Passed: 890 Failed: 11 Skipped: 91
What's missing from this PR that must be addressed before merging:
dpctl
for pybind11 and cython must be updated.I suggest we expand the documentation with explaining the asynchronous execution in a separate PR.
The important change to realize is that execution of tensor
operations is done asynchronously, so to get a proper timing values, one must use queue.wait()
before taking the final time stamp reading.
Submission preserves sequential ordering of operations as executed by Python interpreter.
Array API standard conformance tests for dpctl=0.18.0dev0=py310h15de555_57 ran successfully. Passed: 890 Failed: 11 Skipped: 91
Array API standard conformance tests for dpctl=0.18.0dev0=py310h15de555_57 ran successfully. Passed: 890 Failed: 11 Skipped: 91
Array API standard conformance tests for dpctl=0.18.0dev0=py310h15de555_58 ran successfully. Passed: 890 Failed: 11 Skipped: 91
Array API standard conformance tests for dpctl=0.18.0dev0=py310h15de555_59 ran successfully. Passed: 890 Failed: 11 Skipped: 91
Array API standard conformance tests for dpctl=0.18.0dev0=py310h15de555_60 ran successfully. Passed: 889 Failed: 12 Skipped: 91
This is PR changes the size and content of the struct behind
dpctl.memory._Memory
object.This is backwards incompatible change, and would require rebuilding of downstream projects with native extensions (
cython
, orpybind11
) that usedpctl
.The
_Memory
object no longer explicitly frees USM allocations it made, but delegates this job to a smart pointer. This allowshost_task
jobs that ensure deferment of USM deallocation till after offloaded tasks that operate on it complete execution to do their job by capturing the smart pointer in the callable passed to thehost_task
instead needing to rely on Python reference counting, and thus need to acquire GIL in that callable.Furthermore, the PR introduces mechanism of ensuring sequential order amongst tasks implementing offloading Python API, such as
dpctl.tensor
implementing array API compliant tensor library.One consequence of this change is that timing of
dpctl.tensor
functions must synchronize the execution queue before taking the final timestamp reading.