DualCoder / vgpu_unlock

Unlock vGPU functionality for consumer grade GPUs.
MIT License
4.55k stars 425 forks source link

Please help. about 3070ti #93

Open kuangke9527 opened 2 years ago

kuangke9527 commented 2 years ago

The problem is giving me a real headache.

Driver version number: 470.82. Physical graphics card model: "GeForce RTX 3070 Ti Laptop GPU",Linux Kernel:"Linux kuangke 5.14.0-1024-oem #26-Ubuntu SMP Thu Feb 17 14:35:50 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux"

virsh nodedev-list | grep mdev

 mdev_0208c5a9_7995_4f5b_be0e_e1808cd1f17b
 mdev_bf1a9105_3817_48e4_a02a_0dffbbf0dfe2

mdevctl list

 0208c5a9-7995-4f5b-be0e-e1808cd1f17b 0000:01:00.0 nvidia-527 (defined)
 bf1a9105-3817-48e4-a02a-0dffbbf0dfe2 0000:01:00.0 nvidia-527 (defined)

I start the virtual machine but the vGPU is not functioning properly. The error log is below

3月 06 20:13:38 kuangke systemd[1]: Starting NVIDIA vGPU Manager Daemon...
3月 06 20:13:38 kuangke systemd[1]: Started NVIDIA vGPU Manager Daemon.
3月 06 20:13:38 kuangke bash[992]: vgpu_unlock loaded.
3月 06 20:13:38 kuangke nvidia-vgpu-mgr[992]: vgpu_unlock loaded.
3月 06 20:13:38 kuangke nvidia-vgpu-mgr[1044]: vgpu_unlock loaded.
3月 06 20:13:38 kuangke nvidia-vgpu-mgr[1142]: vgpu_unlock loaded.
3月 06 20:13:39 kuangke modprobe[1142]: vgpu_unlock loaded.
3月 06 20:13:40 kuangke nvidia-vgpu-mgr[1044]: notice: vmiop_env_log: nvidia-vgpu-mgr daemon started
3月 06 20:14:15 kuangke nvidia-vgpu-mgr[2571]: vgpu_unlock loaded.
3月 06 20:14:15 kuangke nvidia-vgpu-mgr[2583]: vgpu_unlock loaded.
3月 06 20:14:15 kuangke nvidia-vgpu-mgr[2583]: notice: vmiop_env_log: vmiop-env: guest_max_gpfn:0x0
3月 06 20:14:15 kuangke nvidia-vgpu-mgr[2583]: notice: vmiop_env_log: (0x0): Received start call from nvidia-vgpu-vfio module: mdev uuid bf1a9105-3817-48e4-a02a-0dffbbf0dfe2 GPU PCI id 00:01:00.0 config params vgpu_type_id=527
3月 06 20:14:15 kuangke nvidia-vgpu-mgr[2583]: notice: vmiop_env_log: (0x0): pluginconfig: vgpu_type_id=527
3月 06 20:14:15 kuangke nvidia-vgpu-mgr[2583]: notice: vmiop_env_log: Successfully updated env symbols!
3月 06 20:14:15 kuangke nvidia-vgpu-mgr[2583]: op_type: 0x2080014b failed.
3月 06 20:14:15 kuangke nvidia-vgpu-mgr[2583]: error: vmiop_log: Assertion Failed at 0x35657507:11534
3月 06 20:14:15 kuangke nvidia-vgpu-mgr[2583]: error: vmiop_log: 11 frames returned by backtrace
3月 06 20:14:15 kuangke nvidia-vgpu-mgr[2583]: error: vmiop_log: /lib/x86_64-linux-gnu/libnvidia-vgpu.so(_nv005372vgpu+0x35) [0x7f1a35681535]
3月 06 20:14:15 kuangke nvidia-vgpu-mgr[2583]: error: vmiop_log: /lib/x86_64-linux-gnu/libnvidia-vgpu.so(+0x7dcbe) [0x7f1a35639cbe]
3月 06 20:14:15 kuangke nvidia-vgpu-mgr[2583]: error: vmiop_log: /lib/x86_64-linux-gnu/libnvidia-vgpu.so(+0x9b507) [0x7f1a35657507]
3月 06 20:14:15 kuangke nvidia-vgpu-mgr[2583]: error: vmiop_log: /lib/x86_64-linux-gnu/libnvidia-vgpu.so(+0x9e3a3) [0x7f1a3565a3a3]
3月 06 20:14:15 kuangke nvidia-vgpu-mgr[2583]: error: vmiop_log: vgpu(+0x139e9) [0x555c420139e9]
3月 06 20:14:15 kuangke nvidia-vgpu-mgr[2583]: error: vmiop_log: vgpu(+0x14ae9) [0x555c42014ae9]
3月 06 20:14:15 kuangke nvidia-vgpu-mgr[2583]: error: vmiop_log: vgpu(+0xe990) [0x555c4200e990]
3月 06 20:14:15 kuangke nvidia-vgpu-mgr[2583]: error: vmiop_log: vgpu(+0xc146) [0x555c4200c146]
3月 06 20:14:15 kuangke nvidia-vgpu-mgr[2583]: error: vmiop_log: vgpu(+0x3c1a) [0x555c42003c1a]
3月 06 20:14:15 kuangke nvidia-vgpu-mgr[2583]: error: vmiop_log: /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf3) [0x7f1a35b590b3]
3月 06 20:14:15 kuangke nvidia-vgpu-mgr[2583]: error: vmiop_log: vgpu(+0x3c5d) [0x555c42003c5d]
3月 06 20:14:15 kuangke nvidia-vgpu-mgr[2583]: error: vmiop_log: Assertion Failed at 0x35657507:11534
3月 06 20:14:15 kuangke nvidia-vgpu-mgr[2583]: error: vmiop_log: 11 frames returned by backtrace
3月 06 20:14:15 kuangke nvidia-vgpu-mgr[2583]: error: vmiop_log: /lib/x86_64-linux-gnu/libnvidia-vgpu.so(_nv005372vgpu+0x35) [0x7f1a35681535]
3月 06 20:14:15 kuangke nvidia-vgpu-mgr[2583]: error: vmiop_log: /lib/x86_64-linux-gnu/libnvidia-vgpu.so(+0x7dcbe) [0x7f1a35639cbe]
3月 06 20:14:15 kuangke nvidia-vgpu-mgr[2583]: error: vmiop_log: /lib/x86_64-linux-gnu/libnvidia-vgpu.so(+0x9b507) [0x7f1a35657507]
3月 06 20:14:15 kuangke nvidia-vgpu-mgr[2583]: error: vmiop_log: /lib/x86_64-linux-gnu/libnvidia-vgpu.so(+0x9e3a3) [0x7f1a3565a3a3]
3月 06 20:14:15 kuangke nvidia-vgpu-mgr[2583]: error: vmiop_log: vgpu(+0x139e9) [0x555c420139e9]
3月 06 20:14:15 kuangke nvidia-vgpu-mgr[2583]: error: vmiop_log: vgpu(+0x14ae9) [0x555c42014ae9]
3月 06 20:14:15 kuangke nvidia-vgpu-mgr[2583]: error: vmiop_log: vgpu(+0xe990) [0x555c4200e990]
3月 06 20:14:15 kuangke nvidia-vgpu-mgr[2583]: error: vmiop_log: vgpu(+0xc146) [0x555c4200c146]
3月 06 20:14:15 kuangke nvidia-vgpu-mgr[2583]: error: vmiop_log: vgpu(+0x3c1a) [0x555c42003c1a]
3月 06 20:14:15 kuangke nvidia-vgpu-mgr[2583]: error: vmiop_log: /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf3) [0x7f1a35b590b3]
3月 06 20:14:15 kuangke nvidia-vgpu-mgr[2583]: error: vmiop_log: vgpu(+0x3c5d) [0x555c42003c5d]
3月 06 20:14:15 kuangke nvidia-vgpu-mgr[2583]: notice: vmiop_log: (0x0): gpu-pci-id : 0x100
3月 06 20:14:15 kuangke nvidia-vgpu-mgr[2583]: notice: vmiop_log: (0x0): vgpu_type : Quadro
3月 06 20:14:15 kuangke nvidia-vgpu-mgr[2583]: notice: vmiop_log: (0x0): Framebuffer: 0xec000000
3月 06 20:14:15 kuangke nvidia-vgpu-mgr[2583]: notice: vmiop_log: (0x0): Virtual Device Id: 0x2230:0x14ff
3月 06 20:14:15 kuangke nvidia-vgpu-mgr[2583]: notice: vmiop_log: (0x0): FRL Value: 60 FPS
3月 06 20:14:15 kuangke nvidia-vgpu-mgr[2583]: notice: vmiop_log: ######## vGPU Manager Information: ########
3月 06 20:14:15 kuangke nvidia-vgpu-mgr[2583]: notice: vmiop_log: Driver Version: 470.82
3月 06 20:14:15 kuangke nvidia-vgpu-mgr[2583]: op_type: 0x2080012f failed.
3月 06 20:14:15 kuangke nvidia-vgpu-mgr[2583]: notice: vmiop_log: (0x0): Cannot query ECC status. vGPU ECC support will be disabled.
3月 06 20:14:15 kuangke nvidia-vgpu-mgr[2583]: op_type: 0x20801322 failed.
3月 06 20:14:15 kuangke nvidia-vgpu-mgr[2583]: error: vmiop_log: (0x0): Failed to get blacklisted pages:0x56
3月 06 20:14:15 kuangke nvidia-vgpu-mgr[2583]: notice: vmiop_log: (0x0): Init frame copy engine: syncing...
3月 06 20:14:15 kuangke nvidia-vgpu-mgr[2583]: notice: vmiop_log: (0x0): vGPU migration disabled
3月 06 20:14:15 kuangke nvidia-vgpu-mgr[2583]: notice: vmiop_log: display_init inst: 0 successful
3月 06 20:14:25 kuangke nvidia-vgpu-mgr[2583]: op_type: 0x20801322 failed.
3月 06 20:14:25 kuangke nvidia-vgpu-mgr[2583]: error: vmiop_log: (0x0): Failed to get blacklisted pages:0x56
3月 06 20:14:25 kuangke nvidia-vgpu-mgr[2583]: notice: vmiop_log: ######## Guest NVIDIA Driver Information: ########
3月 06 20:14:25 kuangke nvidia-vgpu-mgr[2583]: notice: vmiop_log: Driver Version: 472.39
3月 06 20:14:25 kuangke nvidia-vgpu-mgr[2583]: notice: vmiop_log: vGPU version: 0xb0001
3月 06 20:14:25 kuangke nvidia-vgpu-mgr[2583]: notice: vmiop_log: (0x0): vGPU license state: Unlicensed (Unrestricted)
3月 06 20:14:25 kuangke nvidia-vgpu-mgr[2583]: notice: vmiop_log: (0x0): Guest driver unloaded!

My libvirt xml is below

<domain type="kvm">
  <name>win10</name>
  <uuid>56c22340-b93a-45df-a5bf-6791fb8cd3dd</uuid>
  <metadata>
    <libosinfo:libosinfo xmlns:libosinfo="http://libosinfo.org/xmlns/libvirt/domain/1.0">
      <libosinfo:os id="http://microsoft.com/win/10"/>
    </libosinfo:libosinfo>
  </metadata>
  <memory unit="KiB">4194304</memory>
  <currentMemory unit="KiB">4194304</currentMemory>
  <vcpu placement="static">4</vcpu>
  <os>
    <type arch="x86_64" machine="pc-q35-4.2">hvm</type>
    <loader readonly="yes" type="rom">/usr/share/OVMF/OVMF_CODE.fd</loader>
    <nvram>/var/lib/libvirt/qemu/nvram/win10_VARS.fd</nvram>
  </os>
  <features>
    <acpi/>
    <apic/>
    <hyperv>
      <relaxed state="on"/>
      <vapic state="on"/>
      <spinlocks state="on" retries="8191"/>
    </hyperv>
    <vmport state="off"/>
  </features>
  <cpu mode="host-model" check="none">
    <topology sockets="1" cores="2" threads="2"/>
  </cpu>
  <clock offset="localtime">
    <timer name="rtc" tickpolicy="catchup"/>
    <timer name="pit" tickpolicy="delay"/>
    <timer name="hpet" present="no"/>
    <timer name="hypervclock" present="yes"/>
  </clock>
  <on_poweroff>destroy</on_poweroff>
  <on_reboot>restart</on_reboot>
  <on_crash>destroy</on_crash>
  <pm>
    <suspend-to-mem enabled="no"/>
    <suspend-to-disk enabled="no"/>
  </pm>
  <devices>
    <emulator>/usr/bin/qemu-system-x86_64</emulator>
    <disk type="file" device="disk">
      <driver name="qemu" type="qcow2"/>
      <source file="/home/kuangke/kvm/disk/win10.qcow2"/>
      <target dev="sda" bus="sata"/>
      <boot order="1"/>
      <address type="drive" controller="0" bus="0" target="0" unit="0"/>
    </disk>
    <disk type="file" device="cdrom">
      <driver name="qemu" type="raw"/>
      <source file="/home/kuangke/kvm/iso/win10_develop.iso"/>
      <target dev="sdb" bus="sata"/>
      <readonly/>
      <boot order="2"/>
      <address type="drive" controller="0" bus="0" target="0" unit="1"/>
    </disk>
    <controller type="usb" index="0" model="qemu-xhci" ports="15">
      <address type="pci" domain="0x0000" bus="0x02" slot="0x00" function="0x0"/>
    </controller>
    <controller type="sata" index="0">
      <address type="pci" domain="0x0000" bus="0x00" slot="0x1f" function="0x2"/>
    </controller>
    <controller type="pci" index="0" model="pcie-root"/>
    <controller type="pci" index="1" model="pcie-root-port">
      <model name="pcie-root-port"/>
      <target chassis="1" port="0x10"/>
      <address type="pci" domain="0x0000" bus="0x00" slot="0x02" function="0x0" multifunction="on"/>
    </controller>
    <controller type="pci" index="2" model="pcie-root-port">
      <model name="pcie-root-port"/>
      <target chassis="2" port="0x11"/>
      <address type="pci" domain="0x0000" bus="0x00" slot="0x02" function="0x1"/>
    </controller>
    <controller type="pci" index="3" model="pcie-root-port">
      <model name="pcie-root-port"/>
      <target chassis="3" port="0x12"/>
      <address type="pci" domain="0x0000" bus="0x00" slot="0x02" function="0x2"/>
    </controller>
    <controller type="pci" index="4" model="pcie-root-port">
      <model name="pcie-root-port"/>
      <target chassis="4" port="0x13"/>
      <address type="pci" domain="0x0000" bus="0x00" slot="0x02" function="0x3"/>
    </controller>
    <interface type="network">
      <mac address="52:54:00:a6:3a:04"/>
      <source network="default"/>
      <model type="e1000e"/>
      <address type="pci" domain="0x0000" bus="0x01" slot="0x00" function="0x0"/>
    </interface>
    <input type="mouse" bus="ps2"/>
    <input type="keyboard" bus="ps2"/>
    <graphics type="spice" autoport="yes">
      <listen type="address"/>
      <image compression="off"/>
    </graphics>
    <sound model="ich9">
      <address type="pci" domain="0x0000" bus="0x00" slot="0x1b" function="0x0"/>
    </sound>
    <video>
      <model type="qxl" ram="65536" vram="65536" vgamem="16384" heads="1" primary="yes"/>
      <address type="pci" domain="0x0000" bus="0x00" slot="0x01" function="0x0"/>
    </video>
    <hostdev mode="subsystem" type="mdev" managed="no" model="vfio-pci" display="off">
      <source>
        <address uuid="bf1a9105-3817-48e4-a02a-0dffbbf0dfe2"/>
      </source>
      <address type="pci" domain="0x0000" bus="0x04" slot="0x00" function="0x0"/>
    </hostdev>
    <memballoon model="virtio">
      <address type="pci" domain="0x0000" bus="0x03" slot="0x00" function="0x0"/>
    </memballoon>
  </devices>
</domain>
kuangke9527 commented 2 years ago

I think it has something to do with not supporting Ampere? Don't support Ampere now?

Robbot-Zhao commented 2 years ago

The same problem I am facing with RTX3090, waiting for the progress on Ampere architecture.

ssechao commented 2 years ago

I own a RTX3090 and i am facing the same issue. May be if someone explain what is the background of this issue i can try fix it. I just need the right direction to look