Pre-launch workgroupsize auto-tuning

JuliaGPU / KernelAbstractions.jl

Heterogeneous programming in Julia

MIT License

363 stars 65 forks source link

Pre-launch workgroupsize auto-tuning #216

Open tkf opened 3 years ago

tkf commented 3 years ago

If the caller (host-side code) of a kernel needs to pre-allocate buffer that depends on workgroupsize and the workgroupsize is not specified, the caller needs to run the auto-tuning of workgroupsize before launching the kernel. For example, I used it for implementing "mapreduce" kernel in FoldsCUDA.jl. Can we have an API for invoking workgroupsize auto-tuning before launching the kernel?

tkf commented 3 years ago

Can this be supported with dynamic localmem #11?