NVIDIA / nvtrust

Ancillary open source software to support confidential computing on NVIDIA GPUs
Apache License 2.0
190 stars 26 forks source link

Support NCCL for Nvidia Confidential Computing in Multi-GPU CVMs #57

Open Tan-YiFan opened 3 months ago

Tan-YiFan commented 3 months ago

The implementation of NCCL uses cudaHostRegister API at https://github.com/NVIDIA/nccl/blob/v2.21.5-1/src/misc/shmutils.cc#L102. This API is not supported in confidential computing mode.

I plan to explore the performance overhead of Nvidia CC in multi-gpu environment and optimize for it. However, without NCCL, many popular multi-gpu applications could not work (or work at a compromised performance).

I am trying to replace those unsupported APIs while keeping the functionality. After that, would I encounter new challenges?

Tan-YiFan commented 3 months ago

@thisiskarthikj @rnertney

rnertney commented 3 months ago

Hi YiFan,

Thanks for this feedback! For all our future Trusted Computing Solutions, we are certainly scrubbing for the unallowed calls, and will ensure that mitigations are implemented if required.

While you're welcome to test, I would recommend waiting for our official announcements for the follow-on products, as they'll have gone through a stringent QA flow and have documented usage guides. :)