NVIDIA / open-gpu-kernel-modules

NVIDIA Linux open GPU kernel module source
Other
15.23k stars 1.29k forks source link

NVIDIA VGPU does not work #548

Closed sdake closed 1 year ago

sdake commented 1 year ago

NVIDIA Open GPU Kernel Modules Version

535.86.10

Please confirm this issue does not happen with the proprietary driver (of the same version). This issue tracker is only for bugs specific to the open kernel driver.

Operating System and Version

Description: Debian GNU/Linux 12 (bookworm)

Kernel Release

Linux wise-a40x1-1 6.1.0-11-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.38-4 (2023-08-08) x86_64 GNU/Linux

Please confirm you are running a stable release kernel (e.g. not a -rc). We do not accept bug reports for unreleased kernels.

Hardware: GPU

NVIDIA A30 24G

Describe the bug

The NVIDIA VGPU software does not function with the cloud-hypervisor virtual machine monitor. An extensive analysis has been completed, and a summary has been produced.

To Reproduce

Try to use nvidia-vgpu with either a proprietary or open-source driver. In either case, the nvidia-vgpu-vgpu mdev control plane has odd expectations about the command line for the virtual machine monitor.

A very short summary is that nvidia-vgpu is hardcoded to QEMU. Most modern accelerated compute startups appear to want to use cloud-hypervisor as the technology has superior performance and quality.

Bug Incidence

Always

nvidia-bug-report.log.gz

We are all working towards the same goal. I don't have the nvidia-vgpu software at this time. I will ask the individual that filed the issue to attach nvidia-bug-report.sh output.

@dengxuehua would you be kind enough to follow this issue tracker as well as provide the results of nvidia-bug-report.sh?

More Info

No response

sdake commented 1 year ago

Issue within cloud-hypervisor: https://github.com/cloud-hypervisor/cloud-hypervisor/issues/5319.

Thank you, -steve

ttabi commented 1 year ago

I confirm that this does not happen with the proprietary driver package.

Try to use nvidia-vgpu with either a proprietary or open-source driver.

These two statements can't both be true.

sdake commented 1 year ago

@ttabi A simple thank you for spending 20 minutes of my life reporting a bug against your product offerings would be in order.

Thank you. -steve

aritger commented 1 year ago

Sorry for that, @sdake. Thanks for the report.

The open-gpu-kernel-modules do not yet support virtualization. We're currently working on it (it requires changes both in open-gpu-kernel-modules and in the GSP firmware); it may be a few releases before the support is added.

It arguably should be better called out, but the lack of virtualization support is mentioned, buried in the GPU driver README:

http://us.download.nvidia.com/XFree86/Linux-x86_64/535.98/README/kernel_open.html

Sorry for the inconvenience.

cybik commented 1 year ago

@aritger for the record, I know it's not the place, but GPU virtualization should REALLY not be locked out of consumer boards. At least let us vGPU one instance so some of us can virtualize a GPU-enabled Windows VM or something.

aritger commented 1 year ago

Thanks for your feedback. I will relay your message to the appropriate team.

sdake commented 1 year ago

@aritger Did you read my request? I will repeat it so we understand each other.

Your drivers, whether proprietary or open, lack vgpu support for any hypervisor other than QEMU. The broader accelerated computing community much prefer to use modern hypervisors, such as cloud-hypervisor. And unfortunately, the structure of the implementation of vgpu, even if paid, does not work with this hypervisor.

Thank you -steve

aaronp24 commented 1 year ago

I made sure the vgpu team is aware of your concern. Thank you for bringing this to our attention.

sdake commented 1 year ago

Cool thanks Aaron! Super appreciate it. Github is a phenomenal tool to interact with technology suppliers. not sure how the other perspective.

Cheers, Steve

On Tue, Oct 10, 2023 at 12:05 PM Aaron Plattner @.***> wrote:

I made sure the vgpu team is aware of your concern. Thank you for bringing this to our attention.

— Reply to this email directly, view it on GitHub https://github.com/NVIDIA/open-gpu-kernel-modules/issues/548#issuecomment-1756065535, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAFYRCNCRUDIGBJW3BJDH3LX6WL7FAVCNFSM6AAAAAA3XPC34KVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTONJWGA3DKNJTGU . You are receiving this because you were mentioned.Message ID: @.***>