Failed to use slice indexing in `usm_ndarray` built with self-overlapping strides

antonwolfy commented 2 months ago

It looks there is prohibited to use slice indexing while initialize dpctl usm_ndarray array which was built with self-overlapping strides:

import numpy, dpctl, dpctl.tensor as dpt

dpctl.__version__
# Out: '0.18.0dev0+158.g7450558d25'

b = dpt.usm_ndarray(20, numpy.uint8)

# self-overlapping strides
a = dpt.usm_ndarray((2, 3), numpy.float32, buffer=b, strides=(2, 1))
a[:] = 1
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
File dpctl/tensor/_usmarray.pyx:1337, in dpctl.tensor._usmarray.usm_ndarray.__setitem__()

File ~/miniforge3/envs/dpnp_dev/lib/python3.11/site-packages/dpctl/tensor/_copy_utils.py:113, in _copy_from_numpy_into(dst, np_ary)
    112 # synchronizing call
--> 113 ti._copy_numpy_ndarray_into_usm_ndarray(
    114     src=src_ary, dst=dst, sycl_queue=copy_q, depends=dep_ev
    115 )

ValueError: Memory addressed by the output array is not sufficiently ample.

During handling of the above exception, another exception occurred:

ValueError                                Traceback (most recent call last)
Cell In[23], line 1
----> 1 a[:] = 1

File dpctl/tensor/_usmarray.pyx:1339, in dpctl.tensor._usmarray.usm_ndarray.__setitem__()

ValueError: Input of type <class 'int'> could not be copied into dpctl.tensor.usm_ndarray

# that works fine:
a[0, 2] = 1

# works for numpy
b = numpy.ndarray(20, numpy.uint8)

# self-overlapping strides
a = numpy.ndarray((2, 3), numpy.float32, buffer=b, strides=(8, 4))
a[:] = 1

oleksandr-pavlyk commented 2 months ago

Array b occupies 20 bytes, and we are creating a 2D view into type for float32 type. Memory can accommodate 5 elements, lets call them b_f4[0], b_f4[1], b_f4[2], b_f4[3] and b_fp[4] at offsets 0 bytes, 4 bytes, 8 bytes, 12 bytes and 16 bytes from the start of b-allocation.

Array a = dpt.usm_ndarray((2, 3), numpy.float32, buffer=b, strides=(2, 1)) is a view, so that the following correspondence is true:

Element
Array `a`	`a[0,0]`	`a[0,1]`	`a[0,2]`	`a[1,0]`	`a[1, 1]`	`a[1, 2]`
Array `b_f4`	`b_f4[0]`	`b_f4[1]`	`b_f4[2]`	`b_f4[2]`	`b_f4[3]`	`b_f4[4]`

As you have stated, specified strides result in distinct indices mapping to the same memory block, that is, a[1,0] and a[2, 0] refer to the same element. Hence, data-parallel assignment results in a race condition. The error is telling just that.

Array a may be used as a read-only input though:

In [1]: import dpctl.tensor as dpt

In [2]: b = dpt.arange(5, dtype=dpt.float32)

In [3]: a = dpt.usm_ndarray((2, 3), dtype=dpt.float32, buffer=b, strides=(2,1))

In [4]: dpt.pow(a, 2)
Out[4]:
usm_ndarray([[ 0.,  1.,  4.],
             [ 4.,  9., 16.]], dtype=float32)

tensor.usm_ndarray constructor is not intended for end-user consumption. Please use Python Array API constructor functions instead.

Feel free to close this ticket.

antonwolfy commented 2 months ago

@oleksandr-pavlyk , it sounds reasonable, thank you!

IntelPython / dpctl

Failed to use slice indexing in `usm_ndarray` built with self-overlapping strides #1765