NVIDIA / nvtrust

Ancillary open source software to support confidential computing on NVIDIA GPUs
Apache License 2.0
189 stars 26 forks source link

Can we perform the GPU attestation in AWS P5 instance? #65

Open smilenow opened 1 month ago

smilenow commented 1 month ago

Hi, is someone available to perform the GPU attestation in AWS P5 instance? Although it uses 3rd generation AMD EPYC processors, however, AWS doesn't enable the AMD SEV-SNP feature for it, which means, P5 instance is not a CVM. The instance types in AWS that support AMD SEV-SNP can be found here.

You can find the P5 instance spec at https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/p5-instances-started.html

Given this non-CVM AWS P5 instance, the cc_mode is disabled by default. After reading Confidential Computing on NVIDIA H100 GPUs for Secure and Trustworthy AI, I have several questions:

  1. Is it possible to perform the GPU attestation in AWS P5 instance?
    1. Is cc_mode MUST be enabled before performing the GPU attestation? I think the answer is yes, right?
    2. Can we perform the GPU attestation in a non-CVM environment? I also tried the gpu-admin-tools to turn on cc_mode but it doesn't work.
    3. Based on previous question, if we convince our execution environment is trusted, even if it's not a CVM(Intel TDX or AMD SEV-SNP), is it possible to perform the GPU attestation?
Tan-YiFan commented 1 month ago

gpu-admin-tools should be executed in host. Executing in VM would not help. The attestation could be done in non-CVM but must in cc-enabled H100.

smilenow commented 1 month ago

Got it, if gpu-admin-tools is designed to be executed in host, then I think AWS P5 instance type doesn't fit this requirement because it's the virtual server in AWS. I will try to contact with AWS to see whether they can turn on the cc_mode in the corresponding host of the P5 instance.

A further question is, if we turn on the cc_mode in a non-CVM, how can we perform the attestation? Is it same with the process in CVM mentioned in the https://github.com/NVIDIA/nvtrust/tree/main/guest_tools/gpu_verifiers/local_gpu_verifier ?

Tan-YiFan commented 1 month ago

Yes, the procedure is the same. Since the code is open-sourced, you can find whether Nvidia checks whether the VM is a CVM and hack it.   Azure has launched the preview of CVM+H100 cloud: https://aka.ms/cvm-h100-preview. Furthermore, Nvidia produces free hands-on labs for H100 CC at https://www.nvidia.com/en-us/launchpad/ai/develop-confidential-vm-applications/. Maybe helpful for your case.

smilenow commented 1 month ago

Thanks, just want to clarify, do you mean the only thing to hack the checks is to change the return value of the is_cc_enabled() function to always true, and perform the attest in the guest tools? Do we need to hack any other places in the guest tools or the drivers?

Tan-YiFan commented 1 month ago

The driver would check whether the VM is a CVM (you can check other issues in this repo). The attestation tool would not require modification.

smilenow commented 1 month ago

Do you mean https://github.com/NVIDIA/nvtrust/issues/61#issuecomment-2225811861 ? Is it the only change we need to make?

Tan-YiFan commented 1 month ago

I did not find other places where Nvidia checks TDX/SEV