IntelPython / dpctl

Python SYCL bindings and SYCL-based Python Array API library
https://intelpython.github.io/dpctl/
Apache License 2.0
99 stars 30 forks source link

Failed to use slice indexing in `usm_ndarray` built with self-overlapping strides #1765

Closed antonwolfy closed 2 months ago

antonwolfy commented 2 months ago

It looks there is prohibited to use slice indexing while initialize dpctl usm_ndarray array which was built with self-overlapping strides:

import numpy, dpctl, dpctl.tensor as dpt

dpctl.__version__
# Out: '0.18.0dev0+158.g7450558d25'

b = dpt.usm_ndarray(20, numpy.uint8)

# self-overlapping strides
a = dpt.usm_ndarray((2, 3), numpy.float32, buffer=b, strides=(2, 1))
a[:] = 1
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
File dpctl/tensor/_usmarray.pyx:1337, in dpctl.tensor._usmarray.usm_ndarray.__setitem__()

File ~/miniforge3/envs/dpnp_dev/lib/python3.11/site-packages/dpctl/tensor/_copy_utils.py:113, in _copy_from_numpy_into(dst, np_ary)
    112 # synchronizing call
--> 113 ti._copy_numpy_ndarray_into_usm_ndarray(
    114     src=src_ary, dst=dst, sycl_queue=copy_q, depends=dep_ev
    115 )

ValueError: Memory addressed by the output array is not sufficiently ample.

During handling of the above exception, another exception occurred:

ValueError                                Traceback (most recent call last)
Cell In[23], line 1
----> 1 a[:] = 1

File dpctl/tensor/_usmarray.pyx:1339, in dpctl.tensor._usmarray.usm_ndarray.__setitem__()

ValueError: Input of type <class 'int'> could not be copied into dpctl.tensor.usm_ndarray

# that works fine:
a[0, 2] = 1

# works for numpy
b = numpy.ndarray(20, numpy.uint8)

# self-overlapping strides
a = numpy.ndarray((2, 3), numpy.float32, buffer=b, strides=(8, 4))
a[:] = 1
oleksandr-pavlyk commented 2 months ago

Array b occupies 20 bytes, and we are creating a 2D view into type for float32 type. Memory can accommodate 5 elements, lets call them b_f4[0], b_f4[1], b_f4[2], b_f4[3] and b_fp[4] at offsets 0 bytes, 4 bytes, 8 bytes, 12 bytes and 16 bytes from the start of b-allocation.

Array a = dpt.usm_ndarray((2, 3), numpy.float32, buffer=b, strides=(2, 1)) is a view, so that the following correspondence is true:

Element
Array a a[0,0] a[0,1] a[0,2] a[1,0] a[1, 1] a[1, 2]
Array b_f4 b_f4[0] b_f4[1] b_f4[2] b_f4[2] b_f4[3] b_f4[4]

As you have stated, specified strides result in distinct indices mapping to the same memory block, that is, a[1,0] and a[2, 0] refer to the same element. Hence, data-parallel assignment results in a race condition. The error is telling just that.

Array a may be used as a read-only input though:

In [1]: import dpctl.tensor as dpt

In [2]: b = dpt.arange(5, dtype=dpt.float32)

In [3]: a = dpt.usm_ndarray((2, 3), dtype=dpt.float32, buffer=b, strides=(2,1))

In [4]: dpt.pow(a, 2)
Out[4]:
usm_ndarray([[ 0.,  1.,  4.],
             [ 4.,  9., 16.]], dtype=float32)

tensor.usm_ndarray constructor is not intended for end-user consumption. Please use Python Array API constructor functions instead.

Feel free to close this ticket.

antonwolfy commented 2 months ago

@oleksandr-pavlyk , it sounds reasonable, thank you!