Closed jinz2014 closed 1 month ago
For example, a dot product of two arrays.
I suppose that the warp.dot() function computes a dot product of two vectors. Each vector is an element of an array.
Hi @jinz2014 . Warp doesn't support shared memory in kernels directly, but you are free to use shared memory in native function snippets: https://nvidia.github.io/warp/modules/differentiability.html#custom-native-functions
Hi @daedalus5 I see. Will developers need to compute local ID (i.e. threadIdx.x) in a thread block ? I think wp.tid() means global ID.
Are there functions for local ID, thread block size, thread block ID ?
Yes, wp.tid()
is a global ID. We don't have functions in Python for those, but you should be able to access eg threadIdx.x
in a native snippet as you would normally.
Does snippet support template type ?
snippet =
'''
__shared__ T sum[256]
'''
No, I don't think templates would work in snippets currently.
Can you please explain how to support shared memory in a kernel ? Does the warp compiler optimize a kernel with shared memory ? Thanks.