cyberus-technology / virtualbox-kvm

KVM Backend for VirtualBox. With our current development model, we cannot easily accept pull requests here. If you'd like to contribute, feel free to reach out to us, we are happy to find a solution.
GNU General Public License v3.0
952 stars 119 forks source link

SR-IOV graphics on Intel N100 does not work #21

Closed pengu1981 closed 7 months ago

pengu1981 commented 8 months ago

I followed the Instructions to set up SR-IOV Graphics Virtualization but now, I can't start an up to date Windows 11 23H2 Machine.

ls -fl /dev/vfio/
total 0
drwxr-xr-x  3 root root      100 Mar 20 14:36 .
drwxr-xr-x 20 root root    14760 Mar 20 14:34 ..
crw-rw-rw-  1 root root  10, 196 Mar 20 14:34 vfio
crw-------  1 root root 241,   0 Mar 20 14:36 15
drwxr-xr-x  2 root root       60 Mar 20 14:36 devices
00:00:01.594691 AssertLogRel /var/tmp/portage/portage/app-emulation/virtualbox-kvm-9999/work/VirtualBox-7.0.14/src/VBox/VMM/VMMR3/PDMDevice.cpp(234) int pdmR3DevInit(PVM): <NULL>
00:00:01.594751 Configuration error: Too many instances of VfioDev was configured: 3, max 1
00:00:01.763403 VMSetError: /var/tmp/portage/portage/app-emulation/virtualbox-kvm-9999/work/VirtualBox-7.0.14/src/VBox/VMM/VMMR3/VM.cpp(341) int VMR3Create(uint32_t, PCVMM2USERMETHODS, uint64_t, PFNVMATERROR, void*, PFNCFGMCONSTRUCTOR, void*, VM**, UVM**); rc=VERR_PDM_TOO_MANY_DEVICE_INSTANCES
00:00:01.763411 VMSetError: Too many instances of a device.
00:00:01.763945 ERROR [COM]: aRC=NS_ERROR_FAILURE (0x80004005) aIID={6ac83d89-6ee7-4e33-8ae6-b257b2e81be8} aComponent={ConsoleWrap} aText={Too many instances of a device. (VERR_PDM_TOO_MANY_DEVICE_INSTANCES)}, preserve=false aResultDetail=-2867
00:00:01.764016 Console: Machine state changed to 'PoweredOff'
00:00:01.768624 Power up failed (vrc=VERR_PDM_TOO_MANY_DEVICE_INSTANCES, hrc=NS_ERROR_FAILURE (0X80004005))
00:00:01.769727 GUI: UIMachineViewNormal::resendSizeHint: Restoring guest size-hint for screen 0 to 1366x768
00:00:01.769769 ERROR [COM]: aRC=E_ACCESSDENIED (0x80070005) aIID={4680b2de-8690-11e9-b83d-5719e53cf1de} aComponent={DisplayWrap} aText={The console is not powered up (setVideoModeHint)}, preserve=false aResultDetail=0
00:00:01.769811 GUI: Aborting startup due to power up progress issue detected...
00:00:01.774136 GUI: UICommon: Handling aboutToQuit request..
00:00:02.327953 GUI: UICommon: aboutToQuit request handled!

Distribution Gentoo Linux with official package, it seems to make no difference which one is used.

lsb-release -a
LSB Version:    n/a
Distributor ID: Gentoo
Description:    Gentoo Linux
Release:        2.15
Codename:       n/a

Kernel the mentioned intel-its one

uname -r
6.6.15-x86_64-gecd5c86d2570

Modules


lsmod | grep vfio
vfio_pci               12288  0
vfio_pci_core          65536  1 vfio_pci
irqbypass              12288  2 vfio_pci_core,kvm
vfio_iommu_type1       36864  0
vfio                   45056  3 vfio_pci_core,vfio_iommu_type1,vfio_pci
iommufd                65536  1 vfio

lspci output

00:02.0 VGA compatible controller: Intel Corporation Alder Lake-N [UHD Graphics] (prog-if 00 [VGA controller])
        Subsystem: ASRock Incorporation Alder Lake-N [UHD Graphics]
        Flags: bus master, fast devsel, latency 0, IRQ 146, IOMMU group 0
        Memory at 6000000000 (64-bit, non-prefetchable) [size=16M]
        Memory at 4000000000 (64-bit, prefetchable) [size=256M]
        I/O ports at 4000 [size=64]
        Expansion ROM at 000c0000 [virtual] [disabled] [size=128K]
        Capabilities: [40] Vendor Specific Information: Len=0c <?>
        Capabilities: [70] Express Root Complex Integrated Endpoint, IntMsgNum 0
        Capabilities: [ac] MSI: Enable+ Count=1/1 Maskable+ 64bit-
        Capabilities: [d0] Power Management version 2
        Capabilities: [100] Process Address Space ID (PASID)
        Capabilities: [200] Address Translation Service (ATS)
        Capabilities: [300] Page Request Interface (PRI)
        Capabilities: [320] Single Root I/O Virtualization (SR-IOV)
        Kernel driver in use: i915
        Kernel modules: i915

Using hints from this issue but no change.

Logs vbox.log kernel_partial.log

parthy commented 8 months ago

Hi!

It looks like you may have added the VF multiple times:

00:00:01.317814 [/Devices/VfioDev/0/Config/] (level 4)
00:00:01.317815   GuestPCIBusNo      <integer> = 0x0000000000000000 (0)
00:00:01.317816   GuestPCIDeviceNo   <integer> = 0x0000000000000010 (16)
00:00:01.317818   GuestPCIFunctionNo <integer> = 0x0000000000000000 (0)
00:00:01.317819   sysfsPath          <string>  = "/sys/bus/pci/devices/0000:00:02.1/" (cb=35)
00:00:01.317820 
00:00:01.317820 [/Devices/VfioDev/1/] (level 3)
00:00:01.317822   PCIBusNo      <integer> = 0x0000000000000000 (0)
00:00:01.317824   PCIDeviceNo   <integer> = 0x0000000000000011 (17)
00:00:01.317825   PCIFunctionNo <integer> = 0x0000000000000000 (0)
00:00:01.317826   Trusted       <integer> = 0x0000000000000001 (1)
00:00:01.317827 
00:00:01.317827 [/Devices/VfioDev/1/Config/] (level 4)
00:00:01.317829   GuestPCIBusNo      <integer> = 0x0000000000000000 (0)
00:00:01.317830   GuestPCIDeviceNo   <integer> = 0x0000000000000011 (17)
00:00:01.317831   GuestPCIFunctionNo <integer> = 0x0000000000000000 (0)
00:00:01.317832   sysfsPath          <string>  = "/sys/bus/pci/devices/0000:00:02.2/" (cb=35)
00:00:01.317833 
00:00:01.317833 [/Devices/VfioDev/2/] (level 3)
00:00:01.317835   PCIBusNo      <integer> = 0x0000000000000000 (0)
00:00:01.317836   PCIDeviceNo   <integer> = 0x0000000000000012 (18)
00:00:01.317837   PCIFunctionNo <integer> = 0x0000000000000000 (0)
00:00:01.317838   Trusted       <integer> = 0x0000000000000001 (1)
00:00:01.317839 
00:00:01.317839 [/Devices/VfioDev/2/Config/] (level 4)
00:00:01.317841   GuestPCIBusNo      <integer> = 0x0000000000000000 (0)
00:00:01.317842   GuestPCIDeviceNo   <integer> = 0x0000000000000012 (18)
00:00:01.317843   GuestPCIFunctionNo <integer> = 0x0000000000000000 (0)
00:00:01.317844   sysfsPath          <string>  = "/sys/bus/pci/devices/0000:00:02.1" (cb=34)

We currently only support a single VFIO device and that's also all that's needed for the SR-IOV graphics functionality. Can you try removing the superfluous entries (via --detachvfio or by manually editing the VirtualBox VM configuration file)?

pengu1981 commented 8 months ago

Thanks a lot that's what I tried earlier but I could not datach one of the devices:

VBoxManage: error: Unexpected exception: std::bad_alloc [St9bad_alloc]
VBoxManage: error: /var/tmp/portage/portage/app-emulation/virtualbox-kvm-9999/work/VirtualBox-7.0.14/out/linux.amd64/release/obj/VBoxAPIWrap/MachineWrap.cpp[7106] (virtual nsresult MachineWrap::DetachVFIODevice(CBSTR))
VBoxManage: error: Details: code NS_ERROR_FAILURE (0x80004005), component SessionMachine, interface IMachine, callee nsISupports
VBoxManage: error: Context: "DetachVFIODevice(Bstr(ValueUnion.psz).raw())" at line 3521 of file VBoxManageModifyVM.cpp

Is there a way to prevent this?

The vfio_pci module is loaded via /etc/modules-load.d and according to modinfo I didn't see any way to prevent this.

Using a smaller value of sriov_numvfs didn't solve it.

tpressure commented 8 months ago

@pengu1981 can you paste the exact command that you're using to detach the device? My guess would be that you have a trailing slash (/), which currently breaks attach/detach (see README.md).

tpressure commented 8 months ago

As a workaround, you can also remove all Vfio sections from /home/<user>/VirtualBox\ VMs/<vm_name>/<vm_name>.vbox with an editor of your choice.

pengu1981 commented 8 months ago

New Situation:

Kernel

uname -r
6.6.20-x86_64-g8ae68172331b

loaded the vfio_pci module manually not via /etc/modules-load.d/ und followed the hints in the other Ticket as mentioned above.

Now, there were only to attached devices:

00:00:01.223633 [/Devices/VfioDev/0/Config/] (level 4)
00:00:01.223633   GuestPCIBusNo      <integer> = 0x0000000000000000 (0)
00:00:01.223634   GuestPCIDeviceNo   <integer> = 0x0000000000000010 (16)
00:00:01.223635   GuestPCIFunctionNo <integer> = 0x0000000000000000 (0)
00:00:01.223635   sysfsPath          <string>  = "/sys/bus/pci/devices/0000:00:02.1/" (cb=35)
00:00:01.223636 
00:00:01.223636 [/Devices/VfioDev/1/] (level 3)
00:00:01.223637   PCIBusNo      <integer> = 0x0000000000000000 (0)
00:00:01.223641   PCIDeviceNo   <integer> = 0x0000000000000011 (17)
00:00:01.223641   PCIFunctionNo <integer> = 0x0000000000000000 (0)
00:00:01.223642   Trusted       <integer> = 0x0000000000000001 (1)
00:00:01.223642 
00:00:01.223642 [/Devices/VfioDev/1/Config/] (level 4)
00:00:01.223643   GuestPCIBusNo      <integer> = 0x0000000000000000 (0)
00:00:01.223644   GuestPCIDeviceNo   <integer> = 0x0000000000000011 (17)
00:00:01.223644   GuestPCIFunctionNo <integer> = 0x0000000000000000 (0)
00:00:01.223645   sysfsPath          <string>  = "/sys/bus/pci/devices/0000:00:02.2/" (cb=35)

When I try to remove the second one, I got the error mentioned earlier:

 VBoxManage modifyvm win11ntl --detachvfio /sys/bus/pci/devices/0000\:00\:02.2

VBoxManage: error: Unexpected exception: std::bad_alloc [St9bad_alloc]
VBoxManage: error: /var/tmp/portage/portage/app-emulation/virtualbox-kvm-9999/work/VirtualBox-7.0.14/out/linux.amd64/release/obj/VBoxAPIWrap/MachineWrap.cpp[7106] (virtual nsresult MachineWrap::DetachVFIODevice(CBSTR))
VBoxManage: error: Details: code NS_ERROR_FAILURE (0x80004005), component SessionMachine, interface IMachine, callee nsISupports
VBoxManage: error: Context: "DetachVFIODevice(Bstr(ValueUnion.psz).raw())" at line 3521 of file VBoxManageModifyVM.cpp

so I tried to remove it manuelly:

<Vfio>
        <Devices>
          <Device devicePath="/sys/bus/pci/devices/0000:00:02.1/"/>
          <Device devicePath="/sys/bus/pci/devices/0000:00:02.2/"/>
        </Devices>
      </Vfio>

VirtualBox doesn't overwrite the file so there is still one device present. Now I got the same error as shown in the mentioned Ticket:

00:00:01.788310 VFIO: Constructing VFIO PCI device with path /sys/bus/pci/devices/0000:00:02.1/ Guest BDF: 00:10.0
00:00:01.788362 VFIO: Detected VFIO Api Version 0
00:00:01.788398 AssertLogRel /var/tmp/portage/portage/app-emulation/virtualbox-kvm-9999/work/VirtualBox-7.0.14/src/VBox/Devices/Bus/VfioDevice.cpp(78) int VfioDevice::initializeVfio(PPDMDEVINS, std::filesystem::__cxx11::path): vfioGroupFd > 0
00:00:01.788401 VFIO: Could not open VFIO Container
00:00:01.788402 AssertLogRel /var/tmp/portage/portage/app-emulation/virtualbox-kvm-9999/work/VirtualBox-7.0.14/src/VBox/Devices/Bus/VfioDevice.cpp(701) int VfioDevice::init(PPDMDEVINS, std::filesystem::__cxx11::path): RT_SUCCESS(rc)
00:00:01.788404 AssertLogRel /var/tmp/portage/portage/app-emulation/virtualbox-kvm-9999/work/VirtualBox-7.0.14/src/VBox/Devices/Bus/DevVfio.cpp(79) int devVfioConstruct(PPDMDEVINS, int, PCFGMNODE): RT_SUCCESS(rc)
00:00:01.788406 PDM: Failed to construct 'VfioDev'/0! VERR_INVALID_PARAMETER (-2) - Invalid parameter.
00:00:01.788414 VMSetError: /var/tmp/portage/portage/app-emulation/virtualbox-kvm-9999/work/VirtualBox-7.0.14/src/VBox/VMM/VMMR3/PDMDevice.cpp(554) int pdmR3DevInit(PVM); rc=VERR_INVALID_PARAMETER
00:00:01.788415 VMSetError: Failed to construct device 'VfioDev' instance #0
00:00:10.433076 NAT: Zone(nm:mbuf_cluster, used:0)
00:00:10.433698 NAT: Zone(nm:mbuf_packet, used:0)
00:00:10.433716 NAT: Zone(nm:mbuf, used:0)
00:00:10.433810 NAT: Zone(nm:mbuf_jumbo_pagesize, used:0)
00:00:10.434365 NAT: Zone(nm:mbuf_jumbo_9k, used:0)
00:00:10.434786 NAT: Zone(nm:mbuf_jumbo_16k, used:0)
00:00:10.435009 NAT: Zone(nm:mbuf_ext_refcnt, used:0)
00:00:10.437838 AssertLogRel /var/tmp/portage/portage/app-emulation/virtualbox-kvm-9999/work/VirtualBox-7.0.14/src/VBox/Devices/Bus/VfioDevice.cpp(834) int VfioDevice::terminate(PPDMDEVINS): RT_SUCCESS(rc)
00:00:10.609416 ERROR [COM]: aRC=NS_ERROR_FAILURE (0x80004005) aIID={6ac83d89-6ee7-4e33-8ae6-b257b2e81be8} aComponent={ConsoleWrap} aText={Failed to construct device 'VfioDev' instance #0 (VERR_INVALID_PARAMETER)}, preserve=false aResultDetail=-2
00:00:10.609698 Console: Machine state changed to 'PoweredOff'
00:00:10.628975 Power up failed (vrc=VERR_INVALID_PARAMETER, hrc=NS_ERROR_FAILURE (0X80004005))
00:00:10.651437 GUI: UIMachineViewNormal::resendSizeHint: Restoring guest size-hint for screen 0 to 1366x768
00:00:10.651612 ERROR [COM]: aRC=E_ACCESSDENIED (0x80070005) aIID={4680b2de-8690-11e9-b83d-5719e53cf1de} aComponent={DisplayWrap} aText={The console is not powered up (setVideoModeHint)}, preserve=false aResultDetail=0
00:00:10.651757 GUI: Aborting startup due to power up progress issue detected...
00:00:10.671992 GUI: UICommon: Handling aboutToQuit request..
00:00:11.340409 GUI: UICommon: aboutToQuit request handled!

vbox_n.log

tpressure commented 8 months ago

00:00:01.788401 VFIO: Could not open VFIO Container

Seems like you didn't to set the permissions via chmod 0666 /dev/vfio/*.

pengu1981 commented 8 months ago

I did but it stays the same.

tpressure commented 8 months ago

Please upload the output from:

pengu1981 commented 8 months ago
parthy commented 8 months ago

Isn't there still a trailing slash in the path?

Constructing VFIO PCI device with path /sys/bus/pci/devices/0000:00:02.1/

pengu1981 commented 8 months ago

I don't think so?

      <Vfio>
        <Devices>
          <Device devicePath="/sys/bus/pci/devices/0000:00:02.1/"/>
        </Devices>
      </Vfio>
ls -lah  /sys/bus/pci/devices/0000:00:02.1/
total 0
drwxr-xr-x  5 root root    0 Mar 21 13:17 .
drwxr-xr-x 26 root root    0 Mar 21 13:17 ..
-r--r--r--  1 root root 4.0K Mar 21 13:17 ari_enabled
-r--r--r--  1 root root 4.0K Mar 21 13:17 boot_vga
-rw-r--r--  1 root root 4.0K Mar 21 13:17 broken_parity_status
-r--r--r--  1 root root 4.0K Mar 21 13:17 class
-rw-r--r--  1 root root 4.0K Mar 21 13:17 config
-r--r--r--  1 root root 4.0K Mar 21 13:17 consistent_dma_mask_bits
-r--r--r--  1 root root 4.0K Mar 21 13:17 current_link_speed
-r--r--r--  1 root root 4.0K Mar 21 13:17 current_link_width
-rw-r--r--  1 root root 4.0K Mar 21 13:17 d3cold_allowed
-r--r--r--  1 root root 4.0K Mar 21 13:17 device
-r--r--r--  1 root root 4.0K Mar 21 13:17 dma_mask_bits
lrwxrwxrwx  1 root root    0 Mar 21 11:17 driver -> ../../../bus/pci/drivers/vfio-pci
-rw-r--r--  1 root root 4.0K Mar 21 11:17 driver_override
-rw-r--r--  1 root root 4.0K Mar 21 13:17 enable
lrwxrwxrwx  1 root root    0 Mar 21 13:17 iommu -> ../../virtual/iommu/dmar0
lrwxrwxrwx  1 root root    0 Mar 21 11:18 iommu_group -> ../../../kernel/iommu_groups/15
-r--r--r--  1 root root 4.0K Mar 21 13:17 irq
drwxr-xr-x  2 root root    0 Mar 21 13:17 link
-r--r--r--  1 root root 4.0K Mar 21 13:17 local_cpulist
-r--r--r--  1 root root 4.0K Mar 21 13:17 local_cpus
-r--r--r--  1 root root 4.0K Mar 21 13:17 max_link_speed
-r--r--r--  1 root root 4.0K Mar 21 13:17 max_link_width
-r--r--r--  1 root root 4.0K Mar 21 13:17 modalias
-rw-r--r--  1 root root 4.0K Mar 21 13:17 msi_bus
-rw-r--r--  1 root root 4.0K Mar 21 13:17 numa_node
lrwxrwxrwx  1 root root    0 Mar 21 13:17 physfn -> ../0000:00:02.0
drwxr-xr-x  2 root root    0 Mar 21 13:17 power
-r--r--r--  1 root root 4.0K Mar 21 13:17 power_state
--w-------  1 root root 4.0K Mar 21 13:17 reset
-rw-r--r--  1 root root 4.0K Mar 21 13:17 reset_method
-r--r--r--  1 root root 4.0K Mar 21 13:17 resource
-rw-------  1 root root  16M Mar 21 13:17 resource0
-rw-------  1 root root 512M Mar 21 13:17 resource2
-rw-------  1 root root 512M Mar 21 13:17 resource2_wc
-r--r--r--  1 root root 4.0K Mar 21 13:17 revision
--w-------  1 root root 4.0K Mar 21 13:17 sriov_vf_msix_count
lrwxrwxrwx  1 root root    0 Mar 21 11:17 subsystem -> ../../../bus/pci
-r--r--r--  1 root root 4.0K Mar 21 13:17 subsystem_device
-r--r--r--  1 root root 4.0K Mar 21 13:17 subsystem_vendor
-rw-r--r--  1 root root 4.0K Mar 21 13:17 uevent
-r--r--r--  1 root root 4.0K Mar 21 13:17 vendor
drwxr-xr-x  3 root root    0 Mar 21 13:17 vfio-dev
tpressure commented 8 months ago

You need to change

<Device devicePath="/sys/bus/pci/devices/0000:00:02.1/"/>

into

<Device devicePath="/sys/bus/pci/devices/0000:00:02.1"/>

so there is no trailing slash in the device path.

pengu1981 commented 8 months ago

changed but this leed me to another issue:

yes I've changed /etc/security/limits.conf so this should not happen

0:00:01.711343 VFIO: Constructing VFIO PCI device with path /sys/bus/pci/devices/0000:00:02.1 Guest BDF: 00:10.0
00:00:01.711376 VFIO: Detected VFIO Api Version 0
00:00:01.917504 VFIO: Successfully opened VFIO Device: Group Status Flags: 0x3 Device Flags: 0x3, Num BARs: 9, Num IRQ's 5 
00:00:01.917882 VUSB: Attached 'HidMouse' to port 1 on RootHub#1 (FullSpeed)
00:00:01.917894 PGM: The CPU physical address width is 39 bits
00:00:01.917896 PGM: PGMR3InitFinalize: 4 MB PSE mask 0000007fffffffff -> VINF_SUCCESS
00:00:01.917904 TM: TMR3InitFinalize: fTSCModeSwitchAllowed=false
00:00:01.919891 ACPI: Enabling 64-bit prefetch root bus resource 0x00000003a5e00000..0x0000000fffffffff
00:00:01.924357 EM: Exit history optimizations: enabled=true enabled-r0=true enabled-r0-no-preemption=false
00:00:01.924369 APIC: fPostedIntrsEnabled=false fVirtApicRegsEnabled=false fSupportsTscDeadline=false
00:00:01.924373 TMR3UtcNow: nsNow=1 711 024 108 031 486 000 nsPrev=0 -> cNsDelta=1 711 024 108 031 486 000 (offLag=0 offVirtualSync=0 offVirtualSyncGivenUp=0, NowAgain=1 711 024 108 031 486 000)
00:00:01.959604 AssertLogRel /var/tmp/portage/portage/app-emulation/virtualbox-kvm-9999/work/VirtualBox-7.0.14/src/VBox/Devices/Bus/VfioDevice.cpp(747) VfioDevice::registerDmaRange(PVM, RTGCPHYS, RTGCPHYS)::<lambda(uintptr_t, RTGCPHYS, uint64_t, int)>: rc == 0
00:00:01.959629 VFIO: Could not acquire enough memory to map the Guest Physical address space. Adapt your ulimit
00:00:01.959630 AssertLogRel /var/tmp/portage/portage/app-emulation/virtualbox-kvm-9999/work/VirtualBox-7.0.14/src/VBox/Devices/Bus/VfioDevice.cpp(798) int VfioDevice::registerDmaRange(PVM, RTGCPHYS, RTGCPHYS): RT_SUCCESS(rc)
00:00:01.959631 AssertLogRel /var/tmp/portage/portage/app-emulation/virtualbox-kvm-9999/work/VirtualBox-7.0.14/src/VBox/Devices/Bus/VfioDevice.cpp(817) int VfioDevice::initializeDma(PPDMDEVINS): RT_SUCCESS(rc)
tpressure commented 8 months ago

Could not acquire enough memory to map the Guest Physical address space. Adapt your ulimit

You didn't adapt your ulimit. Please stick to the steps described in the SR-IOV tutorial: https://github.com/cyberus-technology/virtualbox-kvm/blob/dev/README.intel-sriov-graphics.md#preparing-the-linux-kernel-for-sr-iov-graphics

Edit /etc/security/limits.conf (required because VFIO needs more locked memory than configured as default)

add:

  • soft memlock unlimited
  • hard memlock unlimited reboot
tpressure commented 8 months ago

yes I've changed /etc/security/limits.conf so this should not happen

Didn't see the first time I looked. Did you reboot your system after the change?

What is the output of ulimit -a?

pengu1981 commented 8 months ago

ok it is working now.

The velues got overidden by a file in /etc/security/limits.d.

ulimit -a
real-time non-blocking time  (microseconds, -R) unlimited
core file size              (blocks, -c) 0
data seg size               (kbytes, -d) unlimited
scheduling priority                 (-e) 0
file size                   (blocks, -f) unlimited
pending signals                     (-i) 126656
max locked memory           (kbytes, -l) unlimited
max memory size             (kbytes, -m) unlimited
open files                          (-n) 1024
pipe size                (512 bytes, -p) 8
POSIX message queues         (bytes, -q) 819200
real-time priority                  (-r) 0
stack size                  (kbytes, -s) 8192
cpu time                   (seconds, -t) unlimited
max user processes                  (-u) 126656
virtual memory              (kbytes, -v) unlimited
file locks                          (-x) unlimited

Now, the machine is starting, got the driver installed, gpu-z shows something but it gets detected wrong later on so unigine heaven fails.

MSI Kombustor only sees the VirtualBox Graphics Card

tpressure commented 8 months ago

Did you also install the Intel display virtualization driver?

And have you executed these three commands?

Note that these commands have to be executed before installing the display virtualization driver.

pengu1981 commented 8 months ago

Did you also install the Intel display virtualization driver?

And have you executed these three commands?

  • Configure ICH9 Chipset: VBoxManage modifyvm --chipset=ICH9
  • Attach the vGPU: VBoxManage modifyvm --attachvfio /sys/bus/pci/devices/0000:00:02.1 (no trailing slash)
  • Change display adapter: VBoxManage modifyvm --graphicscontroller vga-virtiogpu

Note that these commands have to be executed before installing the display virtualization driver.

Thats what I've done but I see three cards in the device manager

tpressure commented 8 months ago

Thats what I've done but I see three cards in the device manager

This is expected. Are all 3 graphic cards running and have their respective drivers attached? Does GPU0 show up in your Taskmanager(Performance tab)? What's the error you see when you run unigine heaven?

pengu1981 commented 8 months ago

No. two basic display adapters and one intel UHD graphics device

Yesterday I had made a fresh install which I'm using now. perhaps I missunderstand something It's the same with the fresh install so I've tried the following:

changed the graphics controller to VirtioGPU (from VGAwithVirtioGPU) In the past, this results in a blank screen, now, this changed The UMD device is poperly loaded, the intel device not (code 43)

tpressure commented 8 months ago

If you see basic display adapters you're missing the correct drivers.

You should install:

Also, please verify that you are using ICH9 as host-bridge in your VirtualBox settings.

VGAwithVirtioGPU is the correct model that you should use.

pengu1981 commented 8 months ago

You should install:

  • VirtualBox guest additions

already installed on both installations

  • Intel display virtualization driver (please look at the tutorial and and follow the steps exactly)

also installed on both installations with reboot as described

Also, please verify that you are using ICH9 as host-bridge in your VirtualBox settings. yes ICH9 is the current host bridge

VGAwithVirtioGPU is the correct model that you should use.

Thats the one use but after (manuel) unsuccessful installation of the DVServerUMD I decided to try the VirtioGPU With this in use, the DVServerUMD gets properly resolved and installed. only the UHD driver can't load in all situations (code 43)

tpressure commented 8 months ago

Can you please upload your /home/<user>/VirtualBox\ VMs/<vm name>/<vm name>.vbox file and a VM log from a boot/shutdown sequence?

pengu1981 commented 8 months ago

I've created a new machine with Windows 10, ahci storage and intel nic and VGAwithVirtIO graphics controller installed the Intel Display Virtualization Drivers via powershell admin console and the system reboots As seen on the two windows 11 machines I see three devices and the DVServerUMD draver manually this gets activated instantly after a flicker but now I have to change the graphics controller because I didn't see any open applicatioons anymore.

After the reboot I see the UMD Device is correctly installed only the UHD Device still shows the code 43. The log and config were on this state.

vm.log vm.vbox.txt

vm

tpressure commented 8 months ago

because I didn't see any open applicatioons anymore.

Normally, this should resolve itself after a couple (~10) of seconds. This might also be happening because your Intel driver does not load successfully before you install DVServer.

Which version of the Intel GPU driver are you using? Can you try install the latest version from: https://www.intel.com/content/www/us/en/download/785597/intel-arc-iris-xe-graphics-windows.html ?

tpressure commented 8 months ago

And could you also please upload the complete output of dmesg?

pengu1981 commented 7 months ago

Which version of the Intel GPU driver are you using? Can you try install the latest version from: https://www.intel.com/content/www/us/en/download/785597/intel-arc-iris-xe-graphics-windows.html ?

I let windows do the job. Because this fails, I tried to install the newest driver which I also use on the native windoows installation but this also fails an several vms. The installer is starting but it leeds to a reboot of the VM and the driver won't be installed.

When trying this via device manager, it's the same. The machine reboots while installing the driver.

This behaviour can be triggered while searching for new devices. I'm seeing this on two machines using virtualbox-kvm. I didn't see this when using VirtualBox without KVM backend.

pengu1981 commented 7 months ago

And could you also please upload the complete output of dmesg?

yes ..

dmesg.txt

tpressure commented 7 months ago

Unfortunately, your dmesg output is incomplete. Please provide a complete kernel log. If dmesg is truncated, you can try journalctrl -b instead.

pengu1981 commented 7 months ago

Unfortunately, your dmesg output is incomplete. Please provide a complete kernel log. If dmesg is truncated, you can try journalctrl -b instead.

log.txt

delayed because of a new issue raised yesterday: VMs suddenly stopped when using vfio on this machine. I've seen this several times in the last years with the solution changing the power supply so this is gone for now

dmar_issue.log

I also changed the following:

 etc/modprobe.d/iommu_unsafe_interrupts.conf 
options vfio_iommu_type1 allow_unsafe_interrupts=1
 /etc/modprobe.d/kvm.conf 
options kvm ignore_msrs=1

but I still see the error 43

tpressure commented 7 months ago

Very interesting, the logs still do not contain the necessary information :(

Can you try to collect a complete kernel log? I'm not sure why this works differently on your machine, but the log should start with something like:

[    0.000000] Linux version 6.7.9-200.fc39.x86_64 (mockbuild@c9040d5832f245329326c60b1688b627) (gcc (GCC) 13.2.1 20231205 (Red Hat 13.2.1-6), GNU ld version 2.40-14.fc39) #1 SMP PREEMPT_DYNAMIC Wed Mar  6 19:35:04 UTC 2024
[    0.000000] Command line: BOOT_IMAGE=(hd0,gpt2)/vmlinuz-6.7.9-200.fc39.x86_64 root=UUID=df8cef91-297e-4c18-9387-2585a0c9f25b ro rd.luks.uuid=luks-d851ce1b-f4c2-4be6-9182-bbd7344e4714 psmouse.synaptics_intertouch=1 i915.enable_gvt=1 rhgb quiet
[    0.000000] BIOS-provided physical RAM map:
[    0.000000] BIOS-e820: [mem 0x0000000000000000-0x0000000000000fff] reserved
[    0.000000] BIOS-e820: [mem 0x0000000000001000-0x0000000000001fff] usable
[    0.000000] BIOS-e820: [mem 0x0000000000002000-0x000000000000bfff] reserved
[    0.000000] BIOS-e820: [mem 0x000000000000c000-0x000000000005efff] usable
[    0.000000] BIOS-e820: [mem 0x000000000005f000-0x0000000000086fff] reserved
[    0.000000] BIOS-e820: [mem 0x0000000000087000-0x0000000000088fff] usable
[    0.000000] BIOS-e820: [mem 0x0000000000089000-0x00000000000fffff] reserved

From my gentoo times (10 years ago) I remember that there was a complete kernel log in /var/log/messages. I'm sure it works differently now, but a full kernel log would be required to help you to resolve these issues.

The DMAR errors are definitively not expected and most likely the cause of your Error 43. The full kernel log would really help here.

pengu1981 commented 7 months ago

Very interesting, the logs still do not contain the necessary information :(

I know that's why I'm trying to reduce the displayed information.

The DMAR errors are definitively not expected and most likely the cause of your Error 43. The full kernel log would really help here.

ok and they were still there but no more VM crashes ...

i915.enable_gvt=1 oh ...

but .. according to modinfo and the log, this option is unknown ...

[    0.801656]     i915:enable_guc=3
[    0.801657]     i915:fastboot=1
[    0.801657]     i915:enable_dc=4
[    0.801658]     i915:enable_fbc=1
[    5.691573] i915: unknown parameter 'enable_gvt' ignored
[    5.691898] i915 0000:00:02.0: Running in SR-IOV PF mode
[    5.692741] i915 0000:00:02.0: [drm] VT-d active for gfx access
[    5.692772] i915 0000:00:02.0: vgaarb: deactivate vga console
uname -r

6.6.21-x86_64-ge33bb6810353
tpressure commented 7 months ago

I know that's why I'm trying to reduce the displayed information.

I think the contrary is the case. By stripping information from the logs, you might strip what we need to look at in order to root-cause the issue.

i915.enable_gvt=1 oh ... but .. according to modinfo and the log, this option is unknown ...

GVT is old technology and not supported for Intel Xe graphics. SR-IOV graphics is the replacement for GVT.

Is there any way you could provide a complete kernel log here? Otherwise, we can only guess what's going on.

pengu1981 commented 7 months ago

ok .. new situation

Mär 26 21:00:33 zombie kernel: Run /init as init process
Mär 26 21:00:33 zombie kernel:   with arguments:
Mär 26 21:00:33 zombie kernel:     /init
Mär 26 21:00:33 zombie kernel:   with environment:
Mär 26 21:00:33 zombie kernel:     HOME=/
Mär 26 21:00:33 zombie kernel:     TERM=linux
Mär 26 21:00:33 zombie kernel:     BOOT_IMAGE=/boot/vmlinuz-6.6.21-x86_64-ge33bb6810353
Mär 26 21:00:33 zombie kernel:     i915:enable_guc=3
Mär 26 21:00:33 zombie kernel:     i915:max_vfs=7
Mär 26 21:00:33 zombie kernel:     i915:enable_gvt=1
Mär 26 21:00:33 zombie kernel:     split_lock_detect=off
Mär 26 21:00:33 zombie kernel:     i915:fastboot=1
Mär 26 21:00:33 zombie kernel:     i915:enable_dc=4
Mär 26 21:00:33 zombie kernel:     i915:enable_fbc=1

GVT is old technology and not supported for Intel Xe graphics. SR-IOV graphics is the replacement for GVT.

GVT is now in use

[    0.793681]     i915:enable_guc=3
[    0.793681]     i915:max_vfs=7
[    0.793681]     i915:enable_gvt=1
[    0.793682]     i915:fastboot=1
[    0.793682]     i915:enable_dc=4
[    0.793683]     i915:enable_fbc=1
[    5.173854] i915 0000:00:02.0: [drm] VT-d active for gfx access
[    5.173887] i915 0000:00:02.0: vgaarb: deactivate vga console
[    5.173943] i915 0000:00:02.0: [drm] Using Transparent Hugepages
[    5.176181] i915 0000:00:02.0: vgaarb: VGA decodes changed: olddecodes=io+mem,decodes=io+mem:owns=io+mem
[    5.179010] i915 0000:00:02.0: [drm] Finished loading DMC firmware i915/adlp_dmc.bin (v2.20)
[    5.188734] mei_hdcp 0000:00:16.0-b638ab7e-94e2-4ea2-a552-d1c54b627f04: bound 0000:00:02.0 (ops i915_hdcp_ops [i915])
[    5.209304] i915 0000:00:02.0: [drm] GT0: GuC firmware i915/tgl_guc_70.bin version 70.20.0
[    5.209315] i915 0000:00:02.0: [drm] GT0: HuC firmware i915/tgl_huc.bin version 7.9.3
[    5.214885] i915 0000:00:02.0: [drm] GT0: HuC: authenticated for all workloads
[    5.215417] i915 0000:00:02.0: [drm] GT0: GUC: submission enabled
[    5.215423] i915 0000:00:02.0: [drm] GT0: GUC: SLPC enabled
[    5.215763] i915 0000:00:02.0: [drm] GT0: GUC: RC enabled
[    5.217316] mei_pxp 0000:00:16.0-fbf6fcf1-96cf-4e2e-a6a6-1bab8cbe36b1: bound 0000:00:02.0 (ops i915_pxp_tee_component_ops [i915])
[    5.217424] i915 0000:00:02.0: [drm] Protected Xe Path (PXP) protected content support initialized
[    5.277542] [drm] Initialized i915 1.6.0 20201103 for 0000:00:02.0 on minor 0
[    5.282086] snd_hda_intel 0000:00:1f.3: bound 0000:00:02.0 (ops i915_audio_component_bind_ops [i915])
[    5.282597] i915 display info: display version: 13
[    5.282604] i915 display info: cursor_needs_physical: no
[    5.282608] i915 display info: has_cdclk_crawl: yes
[    5.282611] i915 display info: has_cdclk_squash: no
[    5.282614] i915 display info: has_ddi: yes
[    5.282617] i915 display info: has_dp_mst: yes
[    5.282619] i915 display info: has_dsb: yes
[    5.282622] i915 display info: has_fpga_dbg: yes
[    5.282624] i915 display info: has_gmch: no
[    5.282627] i915 display info: has_hotplug: yes
[    5.282629] i915 display info: has_hti: no
[    5.282632] i915 display info: has_ipc: yes
[    5.282634] i915 display info: has_overlay: no
[    5.282637] i915 display info: has_psr: yes
[    5.282639] i915 display info: has_psr_hw_tracking: no
[    5.282641] i915 display info: overlay_needs_physical: no
[    5.282644] i915 display info: supports_tv: no
[    5.282646] i915 display info: has_hdcp: yes
[    5.282649] i915 display info: has_dmc: yes
[    5.282652] i915 display info: has_dsc: yes
[    5.362315] fbcon: i915drmfb (fb0) is primary device
[    5.364511] i915 0000:00:02.0: [drm] fb0: i915drmfb frame buffer device

btw until now, I cannot get the xe driver to work (on the adl-n and arc)

tpressure commented 7 months ago

You giving me very little to work with here. It's simply impossible to use GVT with Intel Xe Graphics because the last generation that supports GVT-g is Comet Lake (10th Gen Intel Core) [1]. Anything newer than that needs SR-IOV graphics if you want to use graphics virtualization.

Is there any way you could provide a complete kernel log here? Otherwise, we can only guess what's going on.

Your cpu model/family/stepping indicates that you are working with non-comodity hardware. If you cannot provide full kernel logs here because your working on something confidential, please consider contacting us directly via: service@cyberus-technology.de .

[1] https://elixir.bootlin.com/linux/latest/source/drivers/gpu/drm/i915/intel_gvt.c#L51

pengu1981 commented 7 months ago

It's simply impossible to use GVT with Intel Xe Graphics because the last generation that supports GVT-g is Comet Lake (10th Gen Intel Core) [

This is ok because I want to use SR-IOV so that's why I'm using a Gen12 CPU / GPU Sorry for the confusion I changed the boot flags accordingly

Is there any way you could provide a complete kernel log here? Otherwise, we can only guess what's going on. Yes ... CONFIG_LOG_BUF_SHIFT was set too low (12, sometimes 14) so this fits mostly now

Your cpu model/family/stepping indicates that you are working with non-comodity hardware. Dies this matter regarding the GPU if there were no performance cores?

All other hardware I could use at the moment is too old for SR-IOV (so I have to use GVT-g) or not useable (the ARC A380)

please consider contacting us directly

perhaps this could elimitate missunderstandings which could be too much to clearify in this ticket and could also help others in the end.

dmesg.txt

cmdline.txt

The rest of the situation is olmost the same .. so this isn't power supply related

tpressure commented 7 months ago

I had a closer look at the system you are using and, unfortunately, the GPU of the Intel N100 is not Intel Xe graphics, but rather the old Intel UHD architecture. The product specification and your kernel log confirms that:

[    3.429744] i915 0000:00:02.0: [drm] GT0: GuC firmware i915/tgl_guc_70.bin version 70.20.0
[    3.429754] i915 0000:00:02.0: [drm] GT0: HuC firmware i915/tgl_huc.bin version 7.9.3

As we can see, the firmware that is loaded is for Tiger Lake graphics, which also uses the old UHD architecture.

Even though this GPU has the SR-IOV capability, it is completely unsupported and won't work :disappointed:

pengu1981 commented 7 months ago

I had a closer look at the system you are using and, unfortunately, the GPU of the Intel N100 is not Intel Xe graphics, As we can see, the firmware that is loaded is for Tiger Lake graphics, which also uses the old UHD architecture.

This is correct and I never said anything else.

Even though this GPU has the SR-IOV capability, it is completely unsupported and won't work 😞

Yes this is the Case at the Moment, but things are not as clear as they look at first glance.

As I understand, SR-IOV Virtualisation only works with XE Graphics The first iintegrated XE based GPUs were found in Intel Gen11 CPUs. Nevertheless, Intel continues to call them "Intel UHD Graphics" even in 12th, 13th or 14th Gen CPUs.

If I'm wrong, no Problem, perhaps this is possible at some time on DG2.

tpressure commented 7 months ago

Intel dropped the term UHD graphics for everything newer that 11th gen graphics. For example:

The important fact here is: Intel® Iris® Xe Graphics eligible in the GPU specification section.

As already said, the Intel guest driver only supports SR-IOV for 12th gen graphics or newer. So you are out of luck with the N100 unfortunately.

pengu1981 commented 7 months ago

Intel dropped the term UHD graphics for everything newer that 11th gen graphics. For example:

That's what I mean.

The important fact here is: Intel® Iris® Xe Graphics eligible in the GPU specification section.

OK and this is only available on certain CPUs

As already said, the Intel guest driver only supports SR-IOV for 12th gen graphics or newer. So you are out of luck with the N100 unfortunately.

OK so if I want to use this, I have two options:

tpressure commented 7 months ago

OK so if I want to use this, I have two options:

  • Having such a CPU, I think an i5-2500H is enough
  • it's possible to use a deticated Graphics Card later on (that didn't work atm)

The i5-2500H is a CPU that is expected to work with SR-IOV. Dedicated Intel GPUs don't have the SR-IOV capability, at least I haven't seen any in the wild.

pengu1981 commented 7 months ago

The i5-2500H is a CPU that is expected to work with SR-IOV. Dedicated Intel GPUs don't have the SR-IOV capability, at least I haven't seen any in the wild

and perhaps never will ;-( see here

tpressure commented 7 months ago

@pengu1981 is it ok for you if we close this issue now that we know that the N100 graphics is unsupported? I don't think there's anything more we can do.

pengu1981 commented 7 months ago

@pengu1981 is it ok for you if we close this issue now that we know that the N100 graphics is unsupported? I don't think there's anything more we can do.

Yes, that's how it is.