IntelPython / dpctl

Python SYCL bindings and SYCL-based Python Array API library
https://intelpython.github.io/dpctl/
Apache License 2.0
97 stars 29 forks source link

Unexpected result from divide-like ufuncs with unsigned integer array and negative integer scalar as inputs #1711

Open antonwolfy opened 2 weeks ago

antonwolfy commented 2 weeks ago

The below example demonstrates unexpected dpctl behavior for divide, floor_divide and remainder binary functions. The issue is visible when one of the input has unsinged integer type, while another input is negative integer scalar.

Here is an example for divide:

import numpy, dpctl, dpctl.tensor as dpt

dpctl.__version__
# Out: '0.18.0dev0+73.gc81735a928'

dpt.divide(dpt.asarray(3, dtype='u4'), -2)
# Out: usm_ndarray(6.9849193e-10, dtype=float32)

numpy.divide(numpy.asarray(3, dtype='u4'), -2)
# Out: -1.5

dpt.divide(-4, dpt.asarray(3, dtype='u4'))
# Out: usm_ndarray(1.4316558e+09, dtype=float32)

numpy.divide(-4, numpy.asarray(3, dtype='u4'))
# Out: -1.3333333333333333

similar with floor_divide:

dpt.floor_divide(dpt.asarray(7, dtype='u4'), -2)
# Out: usm_ndarray(0, dtype=uint32)

numpy.floor_divide(numpy.asarray(7, dtype='u4'), -2)
# Out: -4

and remainder:

dpt.remainder(dpt.asarray(7, dtype='u4'), -2)
# Out: usm_ndarray(7, dtype=uint32)

numpy.remainder(numpy.asarray(7, dtype='u4'), -2)
# Out: -1
ndgrigorian commented 2 weeks ago

The cases with divide are especially interesting: using the Numpy 2.0 release candidate, the operations are legal. Likely this is because divide always casts to a floating-point type, so NEP-50 doesn't kick in: both values get up-cast to float.

The remaining cases are illegal as of Numpy 2.0:

In [9]: dpt.floor_divide(dpt.asarray(7, dtype='u4'), -2)
---------------------------------------------------------------------------
OverflowError                             Traceback (most recent call last)
Cell In[9], line 1
----> 1 dpt.floor_divide(dpt.asarray(7, dtype='u4'), -2)

File ~/repos/dpctl/dpctl/tensor/_elementwise_common.py:721, in BinaryElementwiseFunc.__call__(self, o1, o2, out, order)
    719     src2 = o2
    720 else:
--> 721     src2 = dpt.asarray(o2, dtype=o2_dtype, sycl_queue=exec_q)
    723 if order == "A":
    724     order = (
    725         "F"
    726         if all(
   (...)
    733         else "C"
    734     )

File ~/repos/dpctl/dpctl/tensor/_ctors.py:646, in asarray(obj, dtype, device, copy, usm_type, sycl_queue, order)
    641     raise ValueError(
    642         f"Converting {type(obj)} to usm_ndarray requires a copy"
    643     )
    644 # obj is a scalar, create 0d array
    645 return _asarray_from_numpy_ndarray(
--> 646     np.asarray(obj, dtype=dtype),
    647     dtype=dtype,
    648     usm_type=usm_type,
    649     sycl_queue=sycl_queue,
    650     order="C",
    651 )

OverflowError: Python integer -2 out of bounds for uint32

In [10]: np.floor_divide(np.asarray(7, dtype='u4'), -2)
---------------------------------------------------------------------------
OverflowError                             Traceback (most recent call last)
Cell In[10], line 1
----> 1 np.floor_divide(np.asarray(7, dtype='u4'), -2)

OverflowError: Python integer -2 out of bounds for uint32

this is with dpctl 0.17.0 and Numpy 2.0.0rc2

1650 fixed some issues with unsigned integers in the comparison functions, but some issues still remain. In general, it seems that calls to elementwise functions need to be revisited to do more sophisticated type and value checking for Python scalars to avoid these issues.

For the divide cases, I suspect that the Python scalar is being cast to to unsigned integer out-of-bounds, then used in divide, resulting in bad data.

Perhaps cases with scalars can take a fast-path for some functions to promote to the output type, rather than the type of the other array. This avoids bringing the scalar to the device once and then copying it immediately afterward, as well.