IntelPython / numba-dpex

Data Parallel Extension for Numba
https://intelpython.github.io/numba-dpex/
Apache License 2.0
77 stars 33 forks source link

wrong output dtype for math functions #759

Open fcharras opened 2 years ago

fcharras commented 2 years ago

math.ceil and math.floor are supported within a numba kernel and when using it within a CPython interpreter it returns an int.

But within a dpex.kernel it returns a float64, which can be the source of several issues:

Here is a toy example to reproduce the issue:

 import dpctl
import numpy as np
from numba import float32, uint32
import numba_dppy as dpex
import math

def get_kernel(p, q, r):
    shm_shape = (p + 1, q + 1)
    @dpex.kernel
    def test_kernel(a, b):
        threadx = dpex.get_global_id(0)
        thready = dpex.get_global_id(1)
        some_shared_mem = dpex.local.array(shape=shm_shape, dtype=float32)
        x = math.ceil(1 + threadx // r)
        y = math.ceil(1 + thready // r)
        some_shared_mem[x, y] = float32(3.)
        b[threadx, thready] = a[threadx, thready] + float32(3.) + float32(math.inf)
    return test_kernel

# Array dimensions
X = 16 * 16
Y = 16
global_size = X, X

griddim = X, X
blockdim = Y, Y

a = np.arange(X * X, dtype=np.float32).reshape(X, X)
b = np.empty_like(a)

# Use the environment variable SYCL_DEVICE_FILTER to change the default device.
# See https://github.com/intel/llvm/blob/sycl/sycl/doc/EnvironmentVariables.md#sycl_device_filter.
device = dpctl.select_default_device()
print("Using device ...")
device.print_device_info()

with dpctl.device_context(device):
    get_kernel(4, 4, 4)[griddim, blockdim](a, b)

print("Done...")

It fails both on Intel devcloud or a custom environment with more up to date versions

chudur-budur commented 1 year ago

Minimal reproducer using latest syntax:

def test_minimal():
    N = 10
    device = dpctl.select_default_device()

    @dpx.kernel
    def func(a):
        i = dpx.get_global_id(0)
        i = math.ceil(i + 0.5) - 1
        a[i] = a[i] + 1

    a = dpnp.ones(N, dtype=dpnp.float32, device=device)

    func[dpx.Range(N)](a)

    print(a.asnumpy())
diptorupd commented 1 year ago

@fcharras The issue here is that the math.ceil and math.floor functions are replaced by the SYCL equivalents that only support floating point values. We are looking at a solution where the code generator will add a cast to make the function behave the same way as Python math function.

chudur-budur commented 1 year ago

Just to add the fact that all c/c++ implementations out there do float --> float, double --> double for ceil() and floor(), including SYCL and openCL.

fcharras commented 1 year ago

(Sorry for the lack of feedback this week, I took some off time.) Practically speaking, I wouldn't say this issue is too bothersome, it's more a matter of clarity. Python users will refer to python documentation of the math module or will open an interpreter and check the output type and expect it to behave the same in a kernel.

Maybe it's fine to keep SYCL-like behavior, if the documentation gives the differences with the python implementation. Or maybe instead of reusing the math namespace, it could be exposed directly in a numba_dpex.math module ? Users will be less likely (and less right to do so) to assume that dpex.math.floor calls within a kernel should behave like math.floor behaves in cpython.

chudur-budur commented 1 year ago

Hi @fcharras could you please test this branch https://github.com/chudur-budur/numba-dpex/tree/github-759 and verify if they are returning ints? I think this issue is fixed in #960.

fcharras commented 1 year ago

For me, #960 indeed fixes the issue.

Just want to add that I realized this mistake I made in the OP: when saying

it returns a float64

I didn't realize that the output type depends on the input type and thought it would always invoke float64 compute and thus crash gpus that do not have float64 aspect. In fact it seems that math.ceil(x) does anyway work fine on those gpus if x is float32 or int32, so casting both beforehand and afterhand (to int) would have solved this issue.