Closed Firestar99 closed 3 weeks ago
Is there a way with traits we can flip it? Like adding extension traits:
use spirv_std::arch::SubgroupOperations as _;
let float_result = float_val.subgroup_add();
let float_inclusive = float_val.inclusive_subgroup_add();
let float_exclusive = float_val.exclusive_subgroup_add();
or something like:
use spirv_std::arch::ops::SubgroupAdd;
let float_result = <f32 as SubgroupAdd>::add(float_val);
let float_inclusive = <f32 as SubgroupAdd>::inclusive_add(float_val);
let float_exclusive = <f32 as SubgroupAdd>::exclusive_add(float_val);
Not sure if those are more ergonomic, but I think they can be implemented?
I'll be honest, I'm not a fan of it ergonomic-wise. Also current atomics are named simiarly, eg. atomic_i_add
, though they are i_add
whereas this PR uses add_i
.
Here's also a small example of shader code using the renamed subgroup_ballot_exclusive_bit_count()
, unfortunately a ballot op as I have not yet used an arithmetic op anywhere. It's atomically allocating space in a buffer (atomic_counter
) to write elements in a compacted manner, so the resulting buffer has one element after the next without some holes in between.
pub fn subgroup_write_non_uniform(&mut self, t: T) {
let index = unsafe {
let ballot = subgroup_ballot(true);
let count = subgroup_ballot_bit_count(ballot);
let base_index = if subgroup_elect() {
atomic_i_add::<_, { Scope::Device as u32 }, { Semantics::NONE.bits() }>(self.atomic_counter, count)
} else {
0
};
let base_index = subgroup_broadcast_first(base_index);
let inv_index = subgroup_ballot_exclusive_bit_count(ballot);
base_index + inv_index
};
// self.buffer[index] = T;
}
I agree that is probably better, but bare methods still don't feel very idiomatic to me coming from rust with very little graphics background.
Can we do something like a Subgroup struct you can initialize and then call functions on it? I'm thinking how Mutex gives you a guard or how threads can be scoped or not scoped and then joined or how tokio gives you a handle you can later poke at. Subgroups are a stateful "thing", so it feels weird to me to call bare functions to manipulate the thing rather than having the handle to the thing.
Subroups are a stateful "thing"
Subgroup intrinsics are not stateful, they are just plain functions that do special stuff. And in theory come with a control barrier across the subgroup, though in practice that barrier just disappears cause invocations within a subgroup should always progress at the same rate (at least on AMD and most other archs, not sure what the new Nvidia cards do).
The only state there is is SubgroupMask
used in subgroup_ballot operations, where each bit represents an invocation in the subgroup (the ballot
variable in the example). And that's just a fancy wrapper of an UVec4 I added for convenience, as the spec only speaks of UVec4.
#[derive(Copy, Clone, Default, Eq, PartialEq)]
pub struct SubgroupMask(pub glam::UVec4);
Subgroups are stateful during a single shader invocation but lose this state once the invocation ends. So for the code running they are stateful, which is what I meant.
If we have stateful APIs and must_use
we can eliminate some correctness / UB issues and footguns:
"Subgroup operations depend on the correct state of invocations... incorrect mask values can result in undefined behavior or synchronization failure"
"Ignoring the result of subgroup operations can lead to invalid or non-deterministic outcomes, especially in communication primitives like broadcast."
I pop'ed off the last commit to make the subgroup functions match the atomic functions more closely (subgroup_inclusive_add_i
-> subgroup_inclusive_i_add
).
I don't think it's worth while spending a lot of time on perfecting this API, especially since not a lot of people will be using this extension. I'd rather prioritize getting a nice atomics API going and then adjusting subgroups to match that. And even then I think we got more important things to take care of first.
I'd like to get this PR merged in the state it is. But if you have any concrete ideas for a better API, that also doesn't alienate those familiar with the glsl API from the little information out there, feel free to make a new PR.
I wasn't a big fan of how I exposed subgroup ops that require a value from the
GroupOperation
enum. This PR changes all of those functions to be much more like glsl exposes them.These operations:
Got changed to this:
The new function names are almost exactly the same as glsl, except that they have the extra
i
suffix in their name to say that they operate on integers (orf
floats,u
unsigned,s
signed). If anyone got an idea how to unify that, please let me know :D (Note that the asm instruction must be a string literal, so eg. you can't use trait constants to implement it)