DualCoder / vgpu_unlock

Unlock vGPU functionality for consumer grade GPUs.
MIT License
4.61k stars 430 forks source link

I hook RTX2070super to vGPU,it looks like everything is ok ,bu once VM start,it fails error #41

Closed 20170819 closed 3 years ago

20170819 commented 3 years ago

helloI has hooked RTX2070super to vGPU,it looks like everything is ok ,bu once VM start,it fails error. my environment : Linux localhost.localdomain 3.10.0-957.el7.x86_64 NVIDIA-Linux-x86_64-460.73.01.run 1)qemu error is as follows: 2021-04-21T09:42:33.814146Z qemu-kvm: -device vfio-pci,id=hostdev0,sysfsdev=/sys/bus/mdev/devices/4bca2ed2-bf47-4a06-af38-103c5c22d1c6,display=off,bus=pci.6,addr=0x0: vfio error: 4bca2ed2-bf47-4a06-af38-103c5c22d1c6: error getting device from group 14: Input/output error Verify all devices in group 14 are bound to vfio- or pci-stub and not already in use 2)nvidia-vgpu-mgr error is as follows: 17:42:33 localhost.localdomain nvidia-vgpu-mgr[8626]: notice: vmiop_env_log: (0x0): pluginconfig: vgpu_type_id=259 7:42:33 localhost.localdomain nvidia-vgpu-mgr[8626]: notice: vmiop_env_log: Successfully updated env symbols! 17:42:33 localhost.localdomain nvidia-vgpu-mgr[8626]: error: vmiop_log: (0x0): vGPU is supported only on VGX capable boards 17:42:33 localhost.localdomain nvidia-vgpu-mgr[8626]: error: vmiop_log: (0x0): init_device_instance failed for inst 0 with error 1 (vGPU validation of the GPU failed) 17:42:33 localhost.localdomain nvidia-vgpu-mgr[8626]: error: vmiop_log: (0x0): Initialization: init_device_instance failed error 1 17:42:33 localhost.localdomain nvidia-vgpu-mgr[8632]: vgpu_unlock loaded. 17:42:33 localhost.localdomain nvidia-vgpu-mgr[8630]: vgpu_unlock loaded. 17:42:33 localhost.localdomain nvidia-vgpu-mgr[8654]: vgpu_unlock loaded. 17:42:33 localhost.localdomain nvidia-vgpu-mgr[8655]: vgpu_unlock loaded. 17:42:33 localhost.localdomain nvidia-vgpu-mgr[8654]: error: vmiop_env_log: Failed to get VM UUID from QEMU command-line 0x57 3) vgpu looks like is OK [root@localhost mdev_supported_types]# cat nvidia-*/name GRID RTX6000-1Q GRID RTX6000-2Q GRID RTX6000-3Q GRID RTX6000-4Q GRID RTX6000-6Q GRID RTX6000-8Q GRID RTX6000-12Q GRID RTX6000-24Q GRID RTX6000-4C GRID RTX6000-6C GRID RTX6000-8C GRID RTX6000-12C GRID RTX6000-24C GRID RTX6000-1B GRID RTX6000-2B GRID RTX6000-1A GRID RTX6000-2A GRID RTX6000-3A GRID RTX6000-4A GRID RTX6000-6A GRID RTX6000-8A GRID RTX6000-12A GRID RTX6000-24A

what should I do ? help me please,thank you !

sigboe commented 3 years ago

Have you tried the newest commit? 1888236c75d8eac673695be8b000f0b065111c51 I am interested to know if it works for you. You may need to re create your mdev with mdevctl when upgrading. I have the same issue now. But it worked before. In between when it worked and now something has changed, and I tried to recreate my mdev with a different profile. And now it just complains that I should verify all devices in the iommu group are bound to vfio drivers. I have tried the newest commit (pushed after you posted this issue) and someone says that the last commit on 17th of april works.

20170819 commented 3 years ago

oh,thanks, my environment : Linux localhost.localdomain 3.10.0-957.el7.x86_64 NVIDIA-Linux-x86_64-460.73.01.run I reboot nvidia-vgpu-mgr,now my issue is as follows: PCI id 00:01:00.0 config params vgpu_type_id=263 Apr 22 17:54:55 localhost.localdomain nvidia-vgpu-mgr[1407]: notice: vmiop_env_log: (0x0): pluginconfig: vgpu_type_id=263 Apr 22 17:54:55 localhost.localdomain nvidia-vgpu-mgr[1407]: notice: vmiop_env_log: Successfully updated env symbols! Apr 22 17:54:55 localhost.localdomain nvidia-vgpu-mgr[1407]: op_type: 0x20801322 failed. Apr 22 17:54:55 localhost.localdomain nvidia-vgpu-mgr[1407]: op_type: 0x2080014b failed. Apr 22 17:54:55 localhost.localdomain nvidia-vgpu-mgr[1407]: op_type: 0xa0820102 failed. Apr 22 17:54:55 localhost.localdomain nvidia-vgpu-mgr[1407]: error: vmiop_log: NVOS status 0x56 Apr 22 17:54:55 localhost.localdomain nvidia-vgpu-mgr[1407]: error: vmiop_log: Assertion Failed at 0xdcb37183:293 Apr 22 17:54:55 localhost.localdomain nvidia-vgpu-mgr[1407]: error: vmiop_log: 10 frames returned by backtrace Apr 22 17:54:55 localhost.localdomain nvidia-vgpu-mgr[1407]: error: vmiop_log: /lib64/libnvidia-vgpu.so(_nv004938vgpu+0x26) [0x7fdddcb876a6] [root@localhost ~]# cd /sys/class/mdev_bus/0000:01:00.0/mdev_supported_types what should I do ? thank you !

KrutavShah commented 3 years ago

The new commit made 2 days ago has fixed that issue. https://github.com/DualCoder/vgpu_unlock/commit/1888236c75d8eac673695be8b000f0b065111c51 I recommend trying again by downloading the new version and following the steps to apply the unlock all over again, including rebuilding DKMS.

20170819 commented 3 years ago

good ,thanks very much ,I has unlocked rtx 2070 super to vGPU suceesfully .