bqqbarbhg / guppy

GPU compute abstraction
zlib License
1 stars 0 forks source link

Subgroup (wave) functions #3

Open ib00 opened 1 year ago

ib00 commented 1 year ago

What's the best way to add subgroup support to the kernel "language"? (eg, https://github.com/shader-slang/slang/blob/master/docs/wave-intrinsics.md)

Just add functions for all backends (and no-op for CPU) just like for atomics?

bqqbarbhg commented 1 year ago

Hmm yea I suppose that would work like that, unfortunate that it can't really be made to work for CPU as far as I can think of, at least not very ergonomically. As you have probably noticed guppy.h has the definitions for each compute language in #ifs near the top of the header where things like that could be added.

ib00 commented 1 year ago

Yes, subgroups for CPU don't make much sense unless you vectorize CPU kernels (so, "subgroup" for SSE is 4 wide, and 8 wide for AVX). It's more trouble than it's worth it.

But the main reason for subgroups (for GPU) is to have some efficient parallel primitives like prefix sum, radix sort and reduce. But maybe you have already done that in some other way.