coreylowman / cudarc

Safe rust wrapper around CUDA toolkit
Apache License 2.0
483 stars 65 forks source link

In-Place Reduction for NCCL #259

Open cat-state opened 2 weeks ago

cat-state commented 2 weeks ago

NCCL supports all-reduce in place, however Comm::all_reduce takes in a &CudaSlice to read from and a &mut CudaSlice to write into, which doesn't allow in-place reduction.

coreylowman commented 1 week ago

Ah yeah I see that (cuda docs: https://docs.nvidia.com/deeplearning/nccl/user-guide/docs/api/colls.html#c.ncclReduce)

I think in this case due to rust's borrow rules it'd probably be easiest to just add Comm::all_reduce_in_place that takes a &mut CudaSlice. Fairly easy add if anyone wants to contribute a PR for this! Otherwise I can add later this week