Closed oleksandr-pavlyk closed 1 month ago
Deleted rendered PR docs from intelpython.github.com/dpctl, latest should be updated shortly. :crossed_fingers:
Array API standard conformance tests for dpctl=0.17.0dev0=py310h15de555_336 ran successfully. Passed: 888 Failed: 17 Skipped: 91
The example that was crashing on CUDA device:
import dpctl.tensor as dpt
x = dpt.arange(256, dtype="i4", device="cuda:gpu")
dpt.sort(x, descending=True) # crash
Closes gh-1654
The reason behind the crash was out of bound access to shared local memory accessor.
This also fixes a crash in
dpt.sort
on CUDA device for sorting of 256 elements of floating point numbers.