Open leofang opened 3 years ago
@leofang Thank you for your reporting!
I'm considering the following case.
@jit.rawkernel()
def f(b):
laneId = jit.threadIdx.x & 0x1f
if laneId == 0:
value = cp.int64(100)
elif laneid <= 0x1f:
value = cp.int64(200)
b[laneId] = value
In this case, actually value
is well defined for all cases, but it is difficult to check it statically.
I think it is difficult to check the local variables are well defined for all code paths, but we may be able to raise a warning just like -Wuninitialized option of the C++ compiler.
Thanks, @asi1024. I was wondering if it makes sense to parse the nodes hold by ast.If
, and check if the same ast.Name
appears in all nodes, but I agree that it doesn't seem trivial as it is OK for value
to be undefined if it's not used later; that is, it depends on the context if a code is valid (in Python) or not:
@jit.rawkernel()
def f(b):
laneId = jit.threadIdx.x & 0x1f
if laneId == 0:
value = cp.int64(100)
elif laneid <= 0x1f:
vvvvv = 10
b[laneId] = 0 # <--- neither value nor vvvvv is used, so it's OK for them to be not defined in all paths
I am actually not sure how -Wuninitialized
can be implemented -- are you suggesting to pass this flag to nvrtc by default?
@asi1024 This proposal does not address the original issue, but does provide a way to write a valid code: Support accessing data types within the JIT kernel.
Consider a variant of the kernel from the original report:
@jit.rawkernel()
def f(b):
laneId = jit.threadIdx.x & 0x1f
value = b.dtype(0)
if laneId == 0:
value = b.dtype(100)
b[laneId] = value
If we can access b.dtype
within the kernel (which is currently not possible), we could do a proper initialization of the variable and make it typed correctly. It is slightly less pythonic, granted, but I can imagine it will be quite useful for more complex tasks.
@leofang I think this style is not pythonic. I could not find any Python linter that recommends this style to prevent uninitialization of variables.
Even if there is a linter that implements such an option, I would like users to optionally use that feature to check. Anyway, in any case, I think we don't need to add a feature of static check for uninitialized variables in the CuPy JIT.
If we support the CPU simulating feature in the future, maybe we can check it in runtime.
Isn't it possible to let nvcc/nvrtc raise uninitialized value warning so that we don't have to do anything?
I also chatted with @asi1024 that it's would be nice to embed original Python code as a comment in the generated CUDA source, line by line, so that users can easily compare compiler warnings against the generated code.
I think this style is not pythonic. I could not find any Python linter that recommends this style to prevent uninitialization of variables.
Even if there is a linter that implements such an option, I would like users to optionally use that feature to check. Anyway, in any case, I think we don't need to add a feature of static check for uninitialized variables in the CuPy JIT.
@asi1024 Sorry I think I confused myself...Did you refer to the line value = b.dtype(0)
? If so, it was merely suggested as a workaround, for solving the (arguably harder) problem of undeclared variable leaking out of (the if
/else
) scope.
For this particular snippet, it happens that making the variable properly initialized works around the undeclared variable issue, but maybe there is a better example to decouple these two issues. I care much less about uninitialized values, because from C++ perspective all of the supported dtypes can be safely default-initialized.
Having the ability of accessing array types (ex: b.dtype
) in a kernel has additional benefits, such as supporting C++ template-like kernels and allowing user-managed explicit casting. For example, Numba supports statements like y = x.dtype.type(1.)
and y = numba.float32(12.)
in a kernel. I think it's best to be discussed separately. Let me open a new issue.
If we support the CPU simulating feature in the future, maybe we can check it in runtime.
Sounds interesting to me, although I am less interested about this capability 😅 To me CuPy has always been a library that cannot work without GPU, so I am not sure how we would reconcile this conflict. For example, do we pass a CuPy array or a NumPy array to the kernel under the simulation mode? Are we able to not generate any CUDA context initialization request when using JIT?
it's would be nice to embed original Python code as a comment in the generated CUDA source, line by line, so that users can easily compare compiler warnings against the generated code.
Having this would be great!
@leofang Sorry for my late response.
Having the ability of accessing array types (ex:
b.dtype
) in a kernel has additional benefits, such as supporting C++ template-like kernels and allowing user-managed explicit casting. For example, Numba supports statements likey = x.dtype.type(1.)
andy = numba.float32(12.)
in a kernel. I think it's best to be discussed separately. Let me open a new issue.
Current CuPy JIT already supports y = numpy.float32(12.)
to specify the dtype of a local variable. We can support more additional styles like y = x.dtype.type(1.)
or y: numpy.float32 = 1
.
For example, do we pass a CuPy array or a NumPy array to the kernel under the simulation mode? Are we able to not generate any CUDA context initialization request when using JIT?
This feature is still in the planning stage, but it will probably not rely on any CUDA context.
Thanks, @asi1024!
Current CuPy JIT already supports
y = numpy.float32(12.)
to specify the dtype of a local variable. We can support more additional styles likey = x.dtype.type(1.)
ory: numpy.float32 = 1
.
Sorry for my slow action...(still on vacation) I just opened an issue to track this feature request (#5417).
For example, do we pass a CuPy array or a NumPy array to the kernel under the simulation mode? Are we able to not generate any CUDA context initialization request when using JIT?
This feature is still in the planning stage, but it will probably not rely on any CUDA context.
Sounds cool! Not that I am a big fan of such simulators, but I am definitely interested in learning how it's implemented.
On the current master, the following code compiles and runs:
However, a compilation error (better be
NameError
orUnboundLocalError
) should be raised by the JIT compiler, because the variablevalue
is not well defined for all code paths.