NVIDIA / open-gpu-kernel-modules

NVIDIA Linux open GPU kernel module source
Other
15.19k stars 1.27k forks source link

Failed to load NVDIA driver within CVM (TDX) #531

Open herozyg opened 1 year ago

herozyg commented 1 year ago

NVIDIA Open GPU Kernel Modules Version

535.54.03

Please confirm this issue does not happen with the proprietary driver (of the same version). This issue tracker is only for bugs specific to the open kernel driver.

Operating System and Version

Ubuntu22.04

Kernel Release

6.2

Please confirm you are running a stable release kernel (e.g. not a -rc). We do not accept bug reports for unreleased kernels.

Hardware: GPU

A10

Describe the bug

Installed the latest drvier in a TDVM and failed to run "nvidia-smi", log as below:

image

Could you please give any advices? Thank you!

To Reproduce

GPU: A10 CPU: Intel CPU w/ TDX Install Latest driver 535.54.03 in TDVM. Run cmd"nvidia-smi"

Bug Incidence

Always

nvidia-bug-report.log.gz

no.

More Info

No response

jrjatin commented 11 months ago

@jrjatin It seems that the Nvidia doc targets H100. @herozyg was attaching A10 to a CVM. Would this steps work on A10? A10 does not have confidential computing support.

Missed that! Maybe give it a try to see if it helps. Also just ensure to install nvidia drivers with -m=kernel-open

Quillana commented 5 months ago

May I kindly inquire if this issue has been resolved? Just as mentioned by @wdsun1008, integrating CVM with a non-cc GPU could be advantageous, considering we possess alternative software-based methods for confidential computing.

wdsun1008 commented 5 months ago

May I kindly inquire if this issue has been resolved? Just as mentioned by @wdsun1008, integrating CVM with a non-cc GPU could be advantageous, considering we possess alternative software-based methods for confidential computing.

Please refer sev-snp-gpu to enable Nvidia GPUs in SEV-SNP VMs.