JIT: illegal Python function (with undefined variable) gives correct kernel by accident

leofang commented 3 years ago

On the current master, the following code compiles and runs:

import cupy as cp
from cupyx import jit

@jit.rawkernel()
def f(b):
    laneId = jit.threadIdx.x & 0x1f
    if laneId == 0:
        value = cp.int64(100)
    b[laneId] = value

b = cp.empty((32,), dtype=cp.int64)
f[1, 32](b)
print(b)
print(f.cached_code)

However, a compilation error (better be NameError or UnboundLocalError) should be raised by the JIT compiler, because the variable value is not well defined for all code paths.

asi1024 commented 3 years ago

@leofang Thank you for your reporting!

I'm considering the following case.

@jit.rawkernel()
def f(b):
    laneId = jit.threadIdx.x & 0x1f
    if laneId == 0:
        value = cp.int64(100)
    elif laneid <= 0x1f:
        value = cp.int64(200)
    b[laneId] = value

In this case, actually value is well defined for all cases, but it is difficult to check it statically.

I think it is difficult to check the local variables are well defined for all code paths, but we may be able to raise a warning just like -Wuninitialized option of the C++ compiler.

leofang commented 3 years ago

Thanks, @asi1024. I was wondering if it makes sense to parse the nodes hold by ast.If, and check if the same ast.Name appears in all nodes, but I agree that it doesn't seem trivial as it is OK for value to be undefined if it's not used later; that is, it depends on the context if a code is valid (in Python) or not:

@jit.rawkernel()
def f(b):
    laneId = jit.threadIdx.x & 0x1f
    if laneId == 0:
        value = cp.int64(100)
    elif laneid <= 0x1f:
        vvvvv = 10
    b[laneId] = 0  # <--- neither value nor vvvvv is used, so it's OK for them to be not defined in all paths

I am actually not sure how -Wuninitialized can be implemented -- are you suggesting to pass this flag to nvrtc by default?

leofang commented 3 years ago

@asi1024 This proposal does not address the original issue, but does provide a way to write a valid code: Support accessing data types within the JIT kernel.

Consider a variant of the kernel from the original report:

@jit.rawkernel()
def f(b):
    laneId = jit.threadIdx.x & 0x1f
    value = b.dtype(0)
    if laneId == 0:
        value = b.dtype(100)
    b[laneId] = value

If we can access b.dtype within the kernel (which is currently not possible), we could do a proper initialization of the variable and make it typed correctly. It is slightly less pythonic, granted, but I can imagine it will be quite useful for more complex tasks.

asi1024 commented 3 years ago

@leofang I think this style is not pythonic. I could not find any Python linter that recommends this style to prevent uninitialization of variables.

Even if there is a linter that implements such an option, I would like users to optionally use that feature to check. Anyway, in any case, I think we don't need to add a feature of static check for uninitialized variables in the CuPy JIT.

If we support the CPU simulating feature in the future, maybe we can check it in runtime.

kmaehashi commented 3 years ago

Isn't it possible to let nvcc/nvrtc raise uninitialized value warning so that we don't have to do anything?

I also chatted with @asi1024 that it's would be nice to embed original Python code as a comment in the generated CUDA source, line by line, so that users can easily compare compiler warnings against the generated code.

leofang commented 3 years ago

I think this style is not pythonic. I could not find any Python linter that recommends this style to prevent uninitialization of variables.

Even if there is a linter that implements such an option, I would like users to optionally use that feature to check. Anyway, in any case, I think we don't need to add a feature of static check for uninitialized variables in the CuPy JIT.

@asi1024 Sorry I think I confused myself...Did you refer to the line value = b.dtype(0)? If so, it was merely suggested as a workaround, for solving the (arguably harder) problem of undeclared variable leaking out of (the if/else) scope.

For this particular snippet, it happens that making the variable properly initialized works around the undeclared variable issue, but maybe there is a better example to decouple these two issues. I care much less about uninitialized values, because from C++ perspective all of the supported dtypes can be safely default-initialized.

Having the ability of accessing array types (ex: b.dtype) in a kernel has additional benefits, such as supporting C++ template-like kernels and allowing user-managed explicit casting. For example, Numba supports statements like y = x.dtype.type(1.) and y = numba.float32(12.) in a kernel. I think it's best to be discussed separately. Let me open a new issue.

If we support the CPU simulating feature in the future, maybe we can check it in runtime.

Sounds interesting to me, although I am less interested about this capability 😅 To me CuPy has always been a library that cannot work without GPU, so I am not sure how we would reconcile this conflict. For example, do we pass a CuPy array or a NumPy array to the kernel under the simulation mode? Are we able to not generate any CUDA context initialization request when using JIT?

it's would be nice to embed original Python code as a comment in the generated CUDA source, line by line, so that users can easily compare compiler warnings against the generated code.

Having this would be great!

asi1024 commented 3 years ago

@leofang Sorry for my late response.

Having the ability of accessing array types (ex: b.dtype) in a kernel has additional benefits, such as supporting C++ template-like kernels and allowing user-managed explicit casting. For example, Numba supports statements like y = x.dtype.type(1.) and y = numba.float32(12.) in a kernel. I think it's best to be discussed separately. Let me open a new issue.

Current CuPy JIT already supports y = numpy.float32(12.) to specify the dtype of a local variable. We can support more additional styles like y = x.dtype.type(1.) or y: numpy.float32 = 1.

For example, do we pass a CuPy array or a NumPy array to the kernel under the simulation mode? Are we able to not generate any CUDA context initialization request when using JIT?

This feature is still in the planning stage, but it will probably not rely on any CUDA context.

leofang commented 3 years ago

Thanks, @asi1024!

Current CuPy JIT already supports y = numpy.float32(12.) to specify the dtype of a local variable. We can support more additional styles like y = x.dtype.type(1.) or y: numpy.float32 = 1.

Sorry for my slow action...(still on vacation) I just opened an issue to track this feature request (#5417).

For example, do we pass a CuPy array or a NumPy array to the kernel under the simulation mode? Are we able to not generate any CUDA context initialization request when using JIT?

This feature is still in the planning stage, but it will probably not rely on any CUDA context.

Sounds cool! Not that I am a big fan of such simulators, but I am definitely interested in learning how it's implemented.

cupy / cupy

JIT: illegal Python function (with undefined variable) gives correct kernel by accident #5340