Open tkf opened 3 years ago
@localmem
=> cuStaticSharedMem
.
@private
all names are bad :), but yes I hope to excise it eventually.
@localmem
=>cuStaticSharedMem
.
thanks! fixed.
@private
all names are bad :)
I think KernelAbstractions.jl is better than CUDA!
Also @private
is no-op on the GPU. Well pretty much, you could use a MArray
if you actually needed multidimensional scratch spaces
I find this table very useful. Perhaps you want to add it to the docs?
CUDA also has "threads" and "warps". I think "threads" become "work items"(?). I also associate "threads" with "SIMD lanes" on a CPU, and "warps" with "SIMD vectors".
It'd be nice to have KernelAbstractions/CPU/CUDA "rosetta stone" in the documentation so that you can start coding quickly KernelAbstractions if you know some CUDA API.
I guess it'd be something like
@index(Local, Linear)
mod(i, g)
threadIdx().x
@index(Local, Cartesian)[2]
threadIdx().y
@index(Group, Linear)
i ÷ g
blockIdx().x
@index(Group, Cartesian)[2]
blockIdx().y
groupsize()[3]
blockDim().z
prod(groupsize())
g
.x * .y * .z
@index(Global, Linear)
i
@index(Global, Cartesian)[2]
@localmem
)@cuStaticSharedMem
@private
)MArray
? "stack allocation"?@uniform
@synchronize
sync_threads()
? But making CPU part concise and clear is hard.
(Note for myself:
@uniform
is for denoting "loop header" code that is run once. It's used for simulating GPU semantics on CPU; ref: JuliaCon 2020 | How not to write CPU code -- KernelAbstractions.jl | Valentin Churavy (16:28))By the way, after staring at this table for a while, I wonder if it would have been cleaner if
@localmem
was called@groupmem
and@private
was called@localmem
so that you don't need to have to use "private" as a terminology for "more local than local".