Closed oleksandr-pavlyk closed 2 months ago
Deleted rendered PR docs from intelpython.github.com/dpctl, latest should be updated shortly. :crossed_fingers:
Array API standard conformance tests for dpctl=0.17.0dev0=py310h15de555_302 ran successfully. Passed: 870 Failed: 8 Skipped: 92
For contiguous inputs, bump local-work-group size from 64 to 128 work-items.
This change is guided by performance study on Newton root finding example rich in elementwise operations.
With this change,
unitrace
states that 311 invocations of the kernel took 2805824666 ns, before that, with 64 workitems, the time was 3475091844 ns.