Open BlaringIce opened 3 years ago
I have a Tesla M40 working well, using both this repo and and vgpu_unlock_rs. 510.47.03 Driver on Proxmox host.
Like many people have done, my working config passes through to Windows guests an NVIDIA Quadro M6000. It works great. I do not experience any error 43 issues or problems with performance or drivers on Windows 10 or 11.
What brought me here was my attempt to get Linux guests to enjoy the same benefits.
After some tweaks the only way I can get linux working at all is to pass-through a specific GRID device. An unchanged device with no PCI ID changes passed through an M60 -- which would not work with any proprietary NVIDIA drivers.
After changing the PCI IDs -- the Linux guest works great until the official driver goes into limp mode (triggered at 20 minutes uptime and slows the freq and sets a 15 FPS cap). I observe the same performance with the Windows driver going into limp mode when using the unlicensed official vGPU driver for Windows.
The PCI ID that works in Linux guests: [code]# PCI ID for GRID M60 0B #pci_id = 0x13F2114E #pci_device_id = 0x13F2[/code]
It would appear, unlike the Windows drivers, that the Linux proprietary drivers for Quadro and Tesla/Compute cards do not share the same instructions for vGPU capabilities. I have tried a series of different PCI IDs and drivers with no joy.
I'd love to know what steps/process you followed. I've been beating my head against the wall for 2 days now on this project. I've got two M40's that I'm trying to use as as vGPU (this mod plus -RS). Thinks "look" right, but I always get Error 43. I'm using the same driver version, and Proxmox 7.2
Can you share you VM config also?
Where did you get the patches for the kernel versions?
I'd love to know what steps/process you followed. I've been beating my head against the wall for 2 days now on this project. I've got two M40's that I'm trying to use as as vGPU (this mod plus -RS). Thinks "look" right, but I always get Error 43. I'm using the same driver version, and Proxmox 7.2
Can you share you VM config also?
+1
Make sure secure boot is disabled in the UEFI BIOS The story that got me to this....
I followed this guide originally https://wvthoog.nl/proxmox-7-vgpu-v2/ using the pre-patched Everything worked except Error 43.. then swapped over to using the video guide from Craft Computer (https://www.youtube.com/watch?v=jTXPMcBqoi8&t=1626s)
I had all sorts of fun manually patching the 510 driver set for the 5.15 kernel, which maybe I didn't need to...
just about gave up and decided to do a debian VM, disabled the custom profiles (by renaming the toml file at /etc/vgpu_profiles) and stopped spoofing to a quadro M6000 and installed the grid driver in debian, which got me into errors about not being able to load the drm module, which led me to disabling secure boot.... did the same in windows (after having to expand my partition)... and magic... working with the Grid driver. Turned my custom profiles back on, uninstalled the grid driver, reinstalled the quadro desktop drivers.... now I'm at Error 31.. So, progress?
ok, now back to error 43 with the quadro drivers, but, this is still progress. I was getting error 43 with the GRID drivers previously also
@dulasau @angst911
I just want to point out again that I have the 24GB version of the Tesla M40. Earlier others indicated the problem may be related to the 12GB version only.
I can give more details if this isn't enough to get you going. Let me know how it goes.
Beyond that there are very few specific configurations needed for the VM.
Configuration changes to vm config:
Add line args: -uuid 00000000-0000-0000-0000-000000000XXX
where XXX = VMID
Add your hardware to the VM in GUI. I used MDev Type nvidia-12 or whichever you wish as reported by mdevctl types
and has available instances.
I then changed made changes to the MDev Type by creating/editing /etc/vgpu_unlock/profile_override.toml
[profile.nvidia-12]
num_displays = 1
display_width = 3840
display_height = 2160
max_pixels = 8294400
cuda_enabled = 1
frl_enabled = 144
framebuffer = 5905580032
pci_id = 0x17F011A0
pci_device_id = 0x17F0
This was enough to get my Tesla M40 vgpu profile working in Windows 10/11. The device is spoofed as an Quadro M6000 and I increased most the MDev profile to test its capabilities (which I currently game in 4K daily with this working profile)
``> @dulasau @angst911
I just want to point out again that I have the 24GB version of the Tesla M40. Earlier others indicated the problem may be related to the 12GB version only.
I can give more details if this isn't enough to get you going. Let me know how it goes.
- First I installed the vgpu_unlock script onto my proxmox host.
- Secondly, I like how vgpu_unlock-rs complements this repo. So I setup vgpu_unlock-rs onto the proxmox host as well.
Beyond that there are very few specific configurations needed for the VM.
Configuration changes to vm config: Add line
args: -uuid 00000000-0000-0000-0000-000000000XXX
where XXX = VMIDAdd your hardware to the VM in GUI. I used MDev Type nvidia-12 or whichever you wish as reported by
mdevctl types
and has available instances.I then changed made changes to the MDev Type by creating/editing
/etc/vgpu_unlock/profile_override.toml
[profile.nvidia-12]
num_displays = 1
display_width = 3840
display_height = 2160
max_pixels = 8294400
cuda_enabled = 1
frl_enabled = 144
framebuffer = 5905580032
pci_id = 0x17F011A0
pci_device_id = 0x17F0
This was enough to get my Tesla M40 vgpu profile working in Windows 10/11. The device is spoofed as an Quadro M6000 and I increased most the MDev profile to test its capabilities (which I currently game in 4K daily with this working profile)
@republicus What version of proxmox, kernel, and nvidia driver are you on (both host and guest)? -- Note I can see the 512.78 in the screenshot for the guest -- Can you provide a link to that download, I wasn't able to find that on NVIDIA's site.
Which VM Type machine type and Bios/UEFI did you use?
Did you 100% follow the vgpu_unlock instructions, or did you follow the modified instructions for using it with vgpu_unlock?
I'm at the point where the GRID driver works, but error 43 if I used the quadro driver and spoof the device ID Proxmox 7.2, Kenrnel 5.15 VGPU_unlock + vgpu)unlock_rs (Driver patched to include SRC and kbuild config line prior to running nvidia installer) Host Driver: NVIDIA-Linux-x86_64-510.47.03-vgpu-kvm.run manually integrating kennel related driver patched Guest Driver: 511.65_grid_win10_win11_server2016_server2019_server2022_64bit_international
Working grid vgpu_profile.tom
[profile.nvidia-18] num_displays = 1 display_width = 1920 display_height = 1080 max_pixels = 2073600 cuda_enabled = 1 frl_enabled = 60 framebuffer = 5905580032
and the profile that doesn't work when spoofing to a M6000
[profile.nvidia-18] num_displays = 1 display_width = 1920 display_height = 1080 max_pixels = 2073600 cuda_enabled = 1 frl_enabled = 60 framebuffer = 5905580032 pci_id = 0x17F011A0 pci_device_id = 0x17F0
I have both 12GB and 24GB versions and the problems seems to be consistent across both of them.
I first installed and had it working on my PVE 7.1 node but had a failure with my boot drive recently. I swapped in my backup drive which is currently running PVE 6.4 Kernel Version Linux 5.4.195-1-pve
I'll work on updating the node back to PVE 7.2+
Host grid driver: 510.47.03
You can DM me on Discord if you wish:
@angst911 The NVIDIA Advanced Driver Search seems to be less "advanced" than the ordinary search - I'm seeing only old drivers listed (latest 473.81) using it.
Here is a direct link to that driver: NVIDIA RTX / QUADRO DESKTOP AND NOTEBOOK DRIVER RELEASE 510
It's working!!!!! Although not 100% sure exactly why :-D
I see hours of testing ahead, but here is what I have so far:
I was following this setup/config instruction https://gitlab.com/polloloco/vgpu-proxmox and profile config override from here https://drive.google.com/drive/folders/1KHf-vxzUCGqsWZWOW0bXCvMhXh5EJxQl (Jeff from Craft Computing).
Just in case here is profile override:
[profile.nvidia-18] num_displays = 1 display_width = 1920 display_height = 1080 max_pixels = 2073600 cuda_enabled = 1 frl_enabled = 60 framebuffer = 11811160064 pci_id = 0x17F011A0 pci_device_id = 0x17F0
VM config:
args: -uuid 00000000-0000-0000-0000-000000000104 balloon: 0 bios: ovmf boot: order=ide0;ide2;net0 cores: 8 cpu: host efidisk0: local-lvm:vm-104-disk-0,efitype=4m,pre-enrolled-keys=1,size=4M hostpci0: 0000:81:00.0,mdev=nvidia-18,pcie=1 ide0: local-lvm:vm-104-disk-1,size=64G ide2: NetworkBackup:iso/Win11_English_x64v1.iso,media=cdrom,size=5434622K machine: pc-q35-7.0 memory: 12288 meta: creation-qemu=7.0.0,ctime=1662489026 name: Win11-3 net0: e1000=16:AB:A7:2D:FB:4B,bridge=vmbr0,firewall=1 numa: 0 ostype: win11 scsihw: virtio-scsi-pci smbios1: uuid=b560b92f-f856-487e-bb00-a2e495665b59 sockets: 1 tpmstate0: local-lvm:vm-104-disk-2,size=4M,version=v2.0 vga: none vmgenid: 1fa5368d-a7d0-403b-ac65-e033af2de62a
Thats great! Hope to hear good news about the Tesla M40 12GB
Tesla M40 12gb works as well. Changed profile override to ~6gb and was able to start two VMs
Alrighty, I tested Tesla M40 12GB on my Ryzen based "server" and now it's working! The only change from my unsuccessful previous attempts is that I have freshly installed Proxmox (although the same 7.2 version) on it (I was rebuilding my homelab) and probably guest nvidia driver 512.78 (i don't remember which driver version I was using before).
First and primary: I'm coming from a setup where I was using a GTX 1060 with vgpu_unlock just fine, but figured I'd step it up so that I could support more VMs. So, I'm currently trying to use a Tesla M40. Being a Tesla card, you might expect not to need vgpu_unlock, but this is one of the few Tesla's that doesn't support it natively. So, I'm trying to use nvidia-18 types from the M60 profiles with my VMs. I'm aware that I should be using a slightly older driver to match my host driver. However, I'm still getting a code 43 when I load my guest. I would provide some logs here, but I'm not sure what I can include since the entries for the two vgpu services both seem to be fine with no errors other than
nvidia-vgpu-mgr[2588]: notice: vmiop_log: display_init inst: 0 successful
at the end of trying to initialize the mdev device when the VM starts up. Please let me know any other information that I can provide to help debug/troubleshoot. Second: This is probably one of the few instances where this is a problem since most GeForce/Quadro cards have less memory than their vGPU capable counterparts. However, I have a Tesla M40 GPU that has 24 GB of vRAM (in two separate memory regions I would guess, although this SKU isn't listed on the Nvidia graphics processing units Wikipedia page, so I'm not 100% sure). This is in comparison to the Tesla M60's 2x8GB configuration, of which, only 8GB is available for allocation in vGPU. I'm not sure whether the max_instance quantity, as seen in mdevctl types, is defined on the Nvidia driver side, in the vgpu_unlock side, or if it's a mix and the vgpu_unlock side might be able to do something about it. What I'm asking here, though, is whether this value can be redefined so that I can utilize all 24 GB of my available vRAM or, if not that, then at least the 12 GB that I presume is available in the GPU's primary memory.