JuliaGPU / KernelAbstractions.jl

Heterogeneous programming in Julia
MIT License
363 stars 65 forks source link

KernelAbstractions <-> CPU <-> CUDA terminology/API table #217

Open tkf opened 3 years ago

tkf commented 3 years ago

It'd be nice to have KernelAbstractions/CPU/CUDA "rosetta stone" in the documentation so that you can start coding quickly KernelAbstractions if you know some CUDA API.

I guess it'd be something like

KernelAbstractions CPU CUDA
@index(Local, Linear) mod(i, g) threadIdx().x
@index(Local, Cartesian)[2] threadIdx().y
@index(Group, Linear) i ÷ g blockIdx().x
@index(Group, Cartesian)[2] blockIdx().y
groupsize()[3] blockDim().z
prod(groupsize()) g .x * .y * .z
workgroup (group) thread block (block)
@index(Global, Linear) i DIY
@index(Global, Cartesian)[2] DIY
local memory (@localmem) @cuStaticSharedMem
private memory (@private) private to loop body DIY? MArray? "stack allocation"?
@uniform loop header no-op?
@synchronize delimit the loop sync_threads()

? But making CPU part concise and clear is hard.

(Note for myself: @uniform is for denoting "loop header" code that is run once. It's used for simulating GPU semantics on CPU; ref: JuliaCon 2020 | How not to write CPU code -- KernelAbstractions.jl | Valentin Churavy (16:28))

By the way, after staring at this table for a while, I wonder if it would have been cleaner if @localmem was called @groupmem and @private was called @localmem so that you don't need to have to use "private" as a terminology for "more local than local".

vchuravy commented 3 years ago

@localmem => cuStaticSharedMem.

@private all names are bad :), but yes I hope to excise it eventually.

tkf commented 3 years ago

@localmem => cuStaticSharedMem.

thanks! fixed.

@private all names are bad :)

I think KernelAbstractions.jl is better than CUDA!

vchuravy commented 3 years ago

Also @private is no-op on the GPU. Well pretty much, you could use a MArray if you actually needed multidimensional scratch spaces

renatobellotti commented 2 years ago

I find this table very useful. Perhaps you want to add it to the docs?

eschnett commented 8 months ago

CUDA also has "threads" and "warps". I think "threads" become "work items"(?). I also associate "threads" with "SIMD lanes" on a CPU, and "warps" with "SIMD vectors".