Closed antonwolfy closed 2 months ago
Array b
occupies 20 bytes, and we are creating a 2D view into type for float32 type. Memory can accommodate 5 elements, lets call them b_f4[0]
, b_f4[1]
, b_f4[2]
, b_f4[3]
and b_fp[4]
at offsets 0 bytes, 4 bytes, 8 bytes, 12 bytes and 16 bytes from the start of b
-allocation.
Array a = dpt.usm_ndarray((2, 3), numpy.float32, buffer=b, strides=(2, 1))
is a view, so that the following correspondence is true:
Element | ||||||
---|---|---|---|---|---|---|
Array a |
a[0,0] |
a[0,1] |
a[0,2] |
a[1,0] |
a[1, 1] |
a[1, 2] |
Array b_f4 |
b_f4[0] |
b_f4[1] |
b_f4[2] |
b_f4[2] |
b_f4[3] |
b_f4[4] |
As you have stated, specified strides result in distinct indices mapping to the same memory block, that is, a[1,0]
and a[2, 0]
refer to the same element. Hence, data-parallel assignment results in a race condition. The error is telling just that.
Array a
may be used as a read-only input though:
In [1]: import dpctl.tensor as dpt
In [2]: b = dpt.arange(5, dtype=dpt.float32)
In [3]: a = dpt.usm_ndarray((2, 3), dtype=dpt.float32, buffer=b, strides=(2,1))
In [4]: dpt.pow(a, 2)
Out[4]:
usm_ndarray([[ 0., 1., 4.],
[ 4., 9., 16.]], dtype=float32)
tensor.usm_ndarray
constructor is not intended for end-user consumption. Please use Python Array API constructor functions instead.
Feel free to close this ticket.
@oleksandr-pavlyk , it sounds reasonable, thank you!
It looks there is prohibited to use slice indexing while initialize dpctl
usm_ndarray
array which was built with self-overlapping strides: