Closed seungsoo-lee closed 3 months ago
Try: nvidia-smi conf-compute -grs
If the return is "not ready", then nvidia-smi conf-compute -srs 1
. Then torch.cuda.is_available()
should not fail.
Otherwise, could you provide the output of dmesg
?
@Tan-YiFan
Thanks. after I set nvidia-smi conf-compute -srs 1
, torch.cuda.is_available()
is True.
Btw, the way to get Confidential Compute GPUs Ready state: ready
is only
nvidia-smi conf-compute -srs 1
?
how to use attestation SDK to set it to be Ready state..?
You can search set_gpu_ready_state
in this repo. This function does the same thing as nvidia-smi conf-compute -srs [0/1]
.
Oh, I was though that Confidential Compute GPUs Ready state
can be ready from successfully attestation the GPU by using attestation SDK, not by statically setting.
Then, did you run the k8s workloads successfully in your k8s cluster? #36
I have never run k8s workloads on H100 CC.
@Tan-YiFan
Then, in your case, how to do confidential computing workloads in the guest VM?
could you let me know?
Users have root access to the guest VM. So containers are not used.
If you wish to use containers, please reference this guide for now.
Machine Spec.
CPU: Dual AMD EPYC 9224 16-Core Processor GPU: H100 10de:2331 (vbios: 96.00.5E.00.01 cuda: 12.2 nvidia driver: 535.86.10) Host OS: Ubuntu 22.04 with 5.19.0-rc6-snp-host-c4daeffce56e kernel Guest OS: Ubuntu 22.04.2 with 5.19.0-rc6-snp-guest-c4daeffce56e kernel
On the guest VM, CUDA, NVIDIA drivers and pytorch(
pip3 install torch torchvision torchaudio
) installed.nvidia-smi (on the guest) as follows
But, when I treid to run
torch.cuda.is_available()
it says
what's the problem? do you have any idea?