KernelAbstractions <-> CPU <-> CUDA terminology/API table

tkf commented 3 years ago

It'd be nice to have KernelAbstractions/CPU/CUDA "rosetta stone" in the documentation so that you can start coding quickly KernelAbstractions if you know some CUDA API.

I guess it'd be something like

KernelAbstractions	CPU	CUDA
`@index(Local, Linear)`	`mod(i, g)`	`threadIdx().x`
`@index(Local, Cartesian)[2]`		`threadIdx().y`
`@index(Group, Linear)`	`i ÷ g`	`blockIdx().x`
`@index(Group, Cartesian)[2]`		`blockIdx().y`
`groupsize()[3]`		`blockDim().z`
`prod(groupsize())`	`g`	`.x * .y * .z`
workgroup (group)		thread block (block)
`@index(Global, Linear)`	`i`	DIY
`@index(Global, Cartesian)[2]`		DIY
local memory (`@localmem`)		`@cuStaticSharedMem`
private memory (`@private`)	private to loop body	DIY? `MArray`? "stack allocation"?
`@uniform`	loop header	no-op?
`@synchronize`	delimit the loop	`sync_threads()`

? But making CPU part concise and clear is hard.

(Note for myself: @uniform is for denoting "loop header" code that is run once. It's used for simulating GPU semantics on CPU; ref: JuliaCon 2020 | How not to write CPU code -- KernelAbstractions.jl | Valentin Churavy (16:28))

By the way, after staring at this table for a while, I wonder if it would have been cleaner if @localmem was called @groupmem and @private was called @localmem so that you don't need to have to use "private" as a terminology for "more local than local".

vchuravy commented 3 years ago

@localmem => cuStaticSharedMem.

@private all names are bad :), but yes I hope to excise it eventually.

tkf commented 3 years ago

@localmem => cuStaticSharedMem.

thanks! fixed.

@private all names are bad :)

I think KernelAbstractions.jl is better than CUDA!

vchuravy commented 3 years ago

Also @private is no-op on the GPU. Well pretty much, you could use a MArray if you actually needed multidimensional scratch spaces

renatobellotti commented 2 years ago

I find this table very useful. Perhaps you want to add it to the docs?

eschnett commented 8 months ago

CUDA also has "threads" and "warps". I think "threads" become "work items"(?). I also associate "threads" with "SIMD lanes" on a CPU, and "warps" with "SIMD vectors".

JuliaGPU / KernelAbstractions.jl

KernelAbstractions <-> CPU <-> CUDA terminology/API table #217