Rust-GPU / rust-gpu

🐉 Making Rust a first-class language and ecosystem for GPU shaders 🚧
https://rust-gpu.github.io
Apache License 2.0
980 stars 27 forks source link

make subgroup ops match glsl more closely #32

Closed Firestar99 closed 3 weeks ago

Firestar99 commented 1 month ago

I wasn't a big fan of how I exposed subgroup ops that require a value from the GroupOperation enum. This PR changes all of those functions to be much more like glsl exposes them.

These operations:

spirv_std::arch::subgroup_i_add::<{ GroupOperation::Reduce as u32 }, _>(value)
spirv_std::arch::subgroup_i_add::<{ GroupOperation::InclusiveScan as u32 }, _>(value)
spirv_std::arch::subgroup_i_add::<{ GroupOperation::ExclusiveScan as u32 }, _>(value)
spirv_std::arch::subgroup_i_add_clustered::<8, _>(value)

Got changed to this:

spirv_std::arch::subgroup_i_add(value)
spirv_std::arch::subgroup_inclusive_i_add(value)
spirv_std::arch::subgroup_exclusive_i_add(value)
spirv_std::arch::subgroup_clustered_i_add::<8, _>(value)

The new function names are almost exactly the same as glsl, except that they have the extra i suffix in their name to say that they operate on integers (or f floats, u unsigned, s signed). If anyone got an idea how to unify that, please let me know :D (Note that the asm instruction must be a string literal, so eg. you can't use trait constants to implement it)

LegNeato commented 1 month ago

Is there a way with traits we can flip it? Like adding extension traits:

    use spirv_std::arch::SubgroupOperations as _;

    let float_result = float_val.subgroup_add();
    let float_inclusive = float_val.inclusive_subgroup_add();
    let float_exclusive = float_val.exclusive_subgroup_add();

or something like:

    use spirv_std::arch::ops::SubgroupAdd;

    let float_result = <f32 as SubgroupAdd>::add(float_val);
    let float_inclusive = <f32 as SubgroupAdd>::inclusive_add(float_val);
    let float_exclusive = <f32 as SubgroupAdd>::exclusive_add(float_val);

Not sure if those are more ergonomic, but I think they can be implemented?

Firestar99 commented 1 month ago

I'll be honest, I'm not a fan of it ergonomic-wise. Also current atomics are named simiarly, eg. atomic_i_add, though they are i_add whereas this PR uses add_i.

Here's also a small example of shader code using the renamed subgroup_ballot_exclusive_bit_count(), unfortunately a ballot op as I have not yet used an arithmetic op anywhere. It's atomically allocating space in a buffer (atomic_counter) to write elements in a compacted manner, so the resulting buffer has one element after the next without some holes in between.

pub fn subgroup_write_non_uniform(&mut self, t: T) {
    let index = unsafe {
        let ballot = subgroup_ballot(true);
        let count = subgroup_ballot_bit_count(ballot);
        let base_index = if subgroup_elect() {
            atomic_i_add::<_, { Scope::Device as u32 }, { Semantics::NONE.bits() }>(self.atomic_counter, count)
        } else {
            0
        };
        let base_index = subgroup_broadcast_first(base_index);
        let inv_index = subgroup_ballot_exclusive_bit_count(ballot);
        base_index + inv_index
    };
    // self.buffer[index] = T;
}
LegNeato commented 1 month ago

I agree that is probably better, but bare methods still don't feel very idiomatic to me coming from rust with very little graphics background.

Can we do something like a Subgroup struct you can initialize and then call functions on it? I'm thinking how Mutex gives you a guard or how threads can be scoped or not scoped and then joined or how tokio gives you a handle you can later poke at. Subgroups are a stateful "thing", so it feels weird to me to call bare functions to manipulate the thing rather than having the handle to the thing.

Firestar99 commented 1 month ago

Subroups are a stateful "thing"

Subgroup intrinsics are not stateful, they are just plain functions that do special stuff. And in theory come with a control barrier across the subgroup, though in practice that barrier just disappears cause invocations within a subgroup should always progress at the same rate (at least on AMD and most other archs, not sure what the new Nvidia cards do).

The only state there is is SubgroupMask used in subgroup_ballot operations, where each bit represents an invocation in the subgroup (the ballot variable in the example). And that's just a fancy wrapper of an UVec4 I added for convenience, as the spec only speaks of UVec4.

#[derive(Copy, Clone, Default, Eq, PartialEq)]
pub struct SubgroupMask(pub glam::UVec4);
LegNeato commented 1 month ago

Subgroups are stateful during a single shader invocation but lose this state once the invocation ends. So for the code running they are stateful, which is what I meant.

LegNeato commented 1 month ago

If we have stateful APIs and must_use we can eliminate some correctness / UB issues and footguns:

"Subgroup operations depend on the correct state of invocations... incorrect mask values can result in undefined behavior or synchronization failure"

"Ignoring the result of subgroup operations can lead to invalid or non-deterministic outcomes, especially in communication primitives like broadcast."

Firestar99 commented 1 month ago

I pop'ed off the last commit to make the subgroup functions match the atomic functions more closely (subgroup_inclusive_add_i -> subgroup_inclusive_i_add).

I don't think it's worth while spending a lot of time on perfecting this API, especially since not a lot of people will be using this extension. I'd rather prioritize getting a nice atomics API going and then adjusting subgroups to match that. And even then I think we got more important things to take care of first.

I'd like to get this PR merged in the state it is. But if you have any concrete ideas for a better API, that also doesn't alienate those familiar with the glsl API from the little information out there, feel free to make a new PR.