Quardo M6000 Missing Profiles

k-romberg commented 2 years ago

Hey everyone. I think I got things mostly work. I am running proxmox 7.1, kernel 5.13.19-4-pve, NVidia drivers NVIDIA-Linux-x86_64-470.82-vgpu-kvm.run( unmodified ), mdevctl 0.81, with a Quardo M6000 24GB card. When I run 'mdevctl types' I am going getting options as if the card was only 8GB:

mdevctl types 0000:84:00.0 nvidia-11 Available instances: 0 Device API: vfio-pci Name: GRID M60-0B Description: num_heads=2, frl_config=45, framebuffer=512M, max_resolution=2560x1600, max_instance=16 nvidia-12 Available instances: 0 Device API: vfio-pci Name: GRID M60-0Q Description: num_heads=2, frl_config=60, framebuffer=512M, max_resolution=2560x1600, max_instance=16 << SNIP > nvidia-20 Available instances: 0 Device API: vfio-pci Name: GRID M60-4Q Description: num_heads=4, frl_config=60, framebuffer=4096M, max_resolution=5120x2880, max_instance=2 nvidia-21 Available instances: 0 Device API: vfio-pci Name: GRID M60-8A Description: num_heads=1, frl_config=60, framebuffer=8192M, max_resolution=1280x1024, max_instance=1 << SNIP > nvidia-238 Available instances: 0 Device API: vfio-pci Name: GRID M60-1B4 Description: num_heads=4, frl_config=45, framebuffer=1024M, max_resolution=5120x2880, max_instance=8

I am able to create two vGPU profiles and add each one to a VM. So I think in general things are working, but how to enable and/or be able to use the other 16GB of the card's memory?

k-romberg commented 2 years ago

Hmmm..... I am think I am close, but not there. When I boot the Windows 10 VM, it sees the card with the native windows driver. When I try to install 472.39 from either inside the NVIDIA-GRID-xxxx.zip file or downloaded from the nvidia website, it says the hardware or OS is not supported. If I download the latest driver from NVidia it installs, but after a reboot the driver errors out with -43.

Anyone have any ideas about what to do with the VM drivers?

k-romberg commented 2 years ago

Another thing I have noticed here is that for all the profiles, the available instances are all 0. What does that mean?

k-romberg commented 2 years ago

OK, I think I have figured out a couple of things and now get the available profile counts to show up and work as expected. Still no luck with the Windows VM side. I have tried a Linux VM with not much more luck. With Linux, the card shows up in lspci, the driver installs, but when I run nvidia-smi, it errors out saying no card/hardware detected. I get this same behavior using the natural/native PCI-IDs or trying to spoof the VM with different ones from the same chipset GM200.

I guess the $42 question is, has anyone gotten a Quadro card to work on the host and successfully pass a part/profile to a client that it can use?

Dardrai commented 2 years ago

Its "the same issue" as the M40 users have - the m60 is a dual GPU with each 8GB of vram per core (16GB in total for both) -> thus this limits the profiles to max 8GB for that card (you can not use the vram from the second die for a slice that is running on the first die). Because of that (and that M6000 is not "officially supported") there was no need for nv to add other profiles with more vram than that. But as said same issue as M40 users -> if you use vgpu_unlock-rs repo (Rust Version of this unlock) you can use Profiles overrides and thus enable more vram on a profile - Guides for that: https://gitlab.com/polloloco/vgpu-proxmox https://blog.zematoxic.com/06/03/2022/Tesla-M40-vGPU-Proxmox-7-1

DualCoder / vgpu_unlock

Quardo M6000 Missing Profiles #92