Rust-GPU / Rust-CUDA

Ecosystem of libraries and tools for writing and executing fast GPU code fully in Rust.
Apache License 2.0
3.16k stars 120 forks source link

Add Cooperative Groups API integration #87

Open thedodd opened 2 years ago

thedodd commented 2 years ago

This works as follows:

todo


thedodd commented 2 years ago

@RDambrosio016 whenever you get some time (no rush), let me know what you think. I am testing this out as I go on a fairly large project of mine which brought about this need in the first place.

Overall, the bridging code is quite simple. I've given an outline of how I think this should be exposed overall. Let me know what you think, happy to modify things as I go.

Also, for this first pass, I would like to keep focused only on the grid-level components of the cooperative groups API, as well as the basic cooperative launch host-side function. We can add multi-device and the other cooperative group components later.

RDambrosio016 commented 2 years ago

This looks neat, but if im not mistaken, those functions map to single PTX intrinsics directly, wouldn't it be easier to use inline assembly? though i haven't actually looked into this so im not sure if they map to more than one PTX instruction

thedodd commented 2 years ago

wouldn't it be easier to use inline assembly?

I started down that path at first, and for a few of the pertinent functions the corresponding PTX was clear. I was using a base C++ program compiled down to PTX to verify in addition to cross-referencing with the PTX ISA spec. However, I will say, many of the interfaces were not as clear, and this seemed to be a potentially more reliable way to generate the needed code.

Perhaps we can replace some of the clear interfaces with some ASM instead. Happy to iterate on this in the future.