NVIDIA / warp

A Python framework for high performance GPU simulation and graphics
https://nvidia.github.io/warp/
Other
4.3k stars 246 forks source link

Support of shared memory #266

Closed jinz2014 closed 1 month ago

jinz2014 commented 4 months ago

Can you please explain how to support shared memory in a kernel ? Does the warp compiler optimize a kernel with shared memory ? Thanks.

jinz2014 commented 4 months ago

For example, a dot product of two arrays.

I suppose that the warp.dot() function computes a dot product of two vectors. Each vector is an element of an array.

daedalus5 commented 4 months ago

Hi @jinz2014 . Warp doesn't support shared memory in kernels directly, but you are free to use shared memory in native function snippets: https://nvidia.github.io/warp/modules/differentiability.html#custom-native-functions

jinz2014 commented 4 months ago

Hi @daedalus5 I see. Will developers need to compute local ID (i.e. threadIdx.x) in a thread block ? I think wp.tid() means global ID.

jinz2014 commented 4 months ago

Are there functions for local ID, thread block size, thread block ID ?

daedalus5 commented 4 months ago

Yes, wp.tid() is a global ID. We don't have functions in Python for those, but you should be able to access eg threadIdx.x in a native snippet as you would normally.

jinz2014 commented 4 months ago

Does snippet support template type ?

snippet = 
'''
  __shared__ T sum[256]
'''
daedalus5 commented 4 months ago

No, I don't think templates would work in snippets currently.