acidanthera / bugtracker

Acidanthera Bugtracker
385 stars 45 forks source link

Panic in igfx_pm in conjunction with Intel GVT-g in a virtual machine #1914

Open scorpion81 opened 2 years ago

scorpion81 commented 2 years ago

Hi, I have successfully set up a virtual machine with macOS 12.1 running in qemu. I have heard of Intel's GVT-g and wanted to give it a shot. But now a panic thrown by WhateverGreen proved to be a roadblock. My attempt was to share the real gpu with the guest, no full passthrough.

apple crash report output:

panic(cpu 3 caller 0xffffff800c1f93ff): WhateverGreen   igfx_pm: @ ForceWake timeout for domain (unk), expected 0x8

The offending line of code: https://github.com/acidanthera/WhateverGreen/blob/master/WhateverGreen/kern_igfx_pm.cpp#L304

Full Panic report: apple_panic.txt

Config plist of OC config.plist.txt

link to full efi folder zip https://drive.google.com/file/d/1nzCz11wrsMnElBFHCOeYKeFHo47eyC1U/view?usp=sharing (26 MB)

My idea was to temporarily disable the ForceWake Workaround for Coffeelake CPUs / iGPUs since the callstack in the panic report mentions "SafeWake" somewhere, and there is no option to disable the ForceWake workaround to test. https://github.com/acidanthera/WhateverGreen/blob/851f4b9110c971208b765a70bc5ff119340755ca/WhateverGreen/kern_igfx.cpp#L1031

I mean testing with 0 at the mentioned line, instead of 1.

Some more supplemental info about my hardware and VM... if necessary.

The host Hardware (lspci): Acer Aspire A515-52-5981, with Manjaro Linux 5.10.89

00:00.0 Host bridge: Intel Corporation Coffee Lake HOST and DRAM Controller (rev 0b)
00:02.0 VGA compatible controller: Intel Corporation WhiskeyLake-U GT2 [UHD Graphics 620]
00:08.0 System peripheral: Intel Corporation Xeon E3-1200 v5/v6 / E3-1500 v5 / 6th/7th/8th Gen Core Processor Gaussian Mixture Model
00:12.0 Signal processing controller: Intel Corporation Cannon Point-LP Thermal Controller (rev 30)
00:14.0 USB controller: Intel Corporation Cannon Point-LP USB 3.1 xHCI Controller (rev 30)
00:14.2 RAM memory: Intel Corporation Cannon Point-LP Shared SRAM (rev 30)
00:14.3 Network controller: Intel Corporation Cannon Point-LP CNVi [Wireless-AC] (rev 30)
00:15.0 Serial bus controller [0c80]: Intel Corporation Cannon Point-LP Serial IO I2C Controller #0 (rev 30)
00:15.1 Serial bus controller [0c80]: Intel Corporation Cannon Point-LP Serial IO I2C Controller #1 (rev 30)
00:16.0 Communication controller: Intel Corporation Cannon Point-LP MEI Controller #1 (rev 30)
00:17.0 SATA controller: Intel Corporation Cannon Point-LP SATA Controller [AHCI Mode] (rev 30)
00:1d.0 PCI bridge: Intel Corporation Cannon Point-LP PCI Express Root Port #9 (rev f0)
00:1d.4 PCI bridge: Intel Corporation Cannon Point-LP PCI Express Root Port #13 (rev f0)
00:1f.0 ISA bridge: Intel Corporation Cannon Point-LP LPC Controller (rev 30)
00:1f.3 Audio device: Intel Corporation Cannon Point-LP High Definition Audio Controller (rev 30)
00:1f.4 SMBus: Intel Corporation Cannon Point-LP SMBus Controller (rev 30)
00:1f.5 Serial bus controller [0c80]: Intel Corporation Cannon Point-LP SPI Controller (rev 30)
01:00.0 Unassigned class [ff00]: Realtek Semiconductor Co., Ltd. RTL8411B PCI Express Card Reader (rev 01)
01:00.1 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 12)
02:00.0 Non-Volatile memory controller: Sandisk Corp WD Blue SN500 / PC SN520 NVMe SSD (rev 01)

qemu launch script (adapted from docker-osx)

#!/bin/bash

exec qemu-system-x86_64 -m 4096 \
-cpu max,vendor=GenuineIntel,+invtsc,vmware-cpuid-freq=on,+ssse3,+sse4.2,+popcnt,+avx,+aes,+xsave,+xsaveopt,check \
-machine q35,accel=kvm:tcg \
-smp 4,cores=1 \
-device vfio-pci,sysfsdev=/sys/bus/mdev/devices/9085644c-8cf8-4df0-900e-0c1362deddea,bus=pcie.0,addr=0x2,driver=vfio-pci-nohotplug,display=on,ramfb=on \
-usb \
-device usb-kbd \
-device usb-tablet \
-device isa-applesmc,osk=ourhardworkbythesewordsguardedpleasedontsteal\(c\)AppleComputerInc \
-drive if=pflash,format=raw,readonly=on,file=/usr/share/edk2-ovmf/x64/OVMF_CODE.fd \
-drive if=pflash,format=raw,file=/home/xxx/Projekte/gpu_passthru/macOS_VARS.fd \
-smbios type=2 \
-audiodev pa,server=unix:/run/user/1000/pulse/native,id=hda \
-device ich9-intel-hda \
-device hda-duplex,audiodev=hda \
-device ich9-ahci,id=sata,bus=pcie.0,addr=0x7 \
-drive id=OpenCoreBoot,if=none,snapshot=on,format=qcow2,file=/mnt/CA8690DA8690C7F9/VM/Monterey/OpenCore.qcow2 \
-device ide-hd,bus=sata.2,drive=OpenCoreBoot \
-drive id=MacHDD,if=none,file=/mnt/CA8690DA8690C7F9/VM/Monterey/macos_hdd_monterey.img,format=qcow2 \
-device ide-hd,bus=sata.3,drive=MacHDD \
-netdev user,id=net0 \
-device vmxnet3,netdev=net0,id=net0,mac=52:54:00:09:49:17,bus=pcie.0,addr=0xA \
-monitor stdio \
-boot menu=on \
-display gtk,gl=on \
-vga none

qemu error messages:

QEMU 6.2.0 monitor - type 'help' for more information
(qemu) qemu-system-x86_64: -device vfio-pci,sysfsdev=/sys/bus/mdev/devices/9085644c-8cf8-4df0-900e-0c1362deddea,bus=pcie.0,addr=0x2,driver=vfio-pci-nohotplug,display=on,ramfb=on: IGD device 9085644c-8cf8-4df0-900e-0c1362deddea cannot support legacy mode due to existing devices at address 1f.0
qemu-system-x86_64: vfio_pci_write_config(9085644c-8cf8-4df0-900e-0c1362deddea, 0x4, 0x100407, 0x4) failed: Ung?ltige Adresse
qemu-system-x86_64: vfio_pci_write_config(9085644c-8cf8-4df0-900e-0c1362deddea, 0x4, 0x100407, 0x4) failed: Ung?ltige Adresse
qemu-system-x86_64: vfio_pci_write_config(9085644c-8cf8-4df0-900e-0c1362deddea, 0x4, 0x100407, 0x4) failed: Ung?ltige Adresse
scorpion81 commented 2 years ago

not quite sure how it could be done, but would it be possible to write some "customized" kext for intel iGPUs based on linux's i915 driver ? I mean a similar thing is being done with AirportItlwm.kext, based in the itlwm linux driver. (I assume also a GPU driver is perhaps harder to implement than a wifi driver.)

scorpion81 commented 2 years ago

Phew, NOW i finally realized why on earth i couldnt find the oc logs... actually the EFI is there twice! One instance inside the Opencore.qcow bootloader drive and one seems to be part of the main macos disk. The problem here was if you edit the config.plist (and other files) from inside macos, everything is stored on the main disk. And if you mount the bootloader disk from linux and edit there, it seems to be copied to the main disk internally when or before booting. I almost got crazy because edits from inside macOS went lost all the time. Also the logs are on the main disk EFI, the bootloader disk seems not being writable from macOS (after mounting it shows as empty)... Rather weird. I wonder why this is like it is ?

I will try to create a proper log, but the problem is i think if i boot into mac os with my backup configuration (a qemu virtualized gpu, no gvt-g) the logs may be overwritten. And if i try to boot into mac os with my testing configuration, i am unable to boot mac os fully up and hence cannot reach the logs on the main disk... some chicken-egg problem. Or I dont know yet what parameters exactly OC needs to write the logs inside the bootloader disk, aka the "correct" EFI. Not the copy on the main disk. Could be easier to view the logs from linux. :slightly_smiling_face:

scorpion81 commented 2 years ago

I also have experimented with Hackintool, i found there is a patch generator. when you actually select one of the supported framebuffers and a set of options, it will produce a snippet for the config.plist to add into. But no luck yet, i think i literally tried all combinations... But then the boot is stuck right before the mac OS handover happens, it remains stuck in the terminal output from booting. GVT-g seems to have some special caveats... hmm.

vit9696 commented 2 years ago

Try commenting out modForceWakeWorkaround.enabled = true; twice in https://github.com/acidanthera/WhateverGreen/blob/master/WhateverGreen/kern_igfx.cpp. If it works, maybe we can add an option.

But I believe the issue is that Apple drivers see no output target, and the crash here is a consequence of that.

scorpion81 commented 2 years ago

First off, how can i obtain those mac OS boot logs in text form ? I assume its not good to use screenshots for that, but atm I dont know a good alternative.

Update: I managed now to actually boot up the VM with the gpu shared via GVT-g BUT... if i fake it with UHD620 (WhiskeyLake) aka A53E0000 as device id, the apple driver interacts with it, apple driver iGPU lines

but after ~5 minutes the boot process is stuck with messages about inaccessible framebuffer. apple framebuffer being stuck

At the same time, qemu outputs those lines...

qemu-system-x86_64: vfio_pci_write_config(9085644c-8cf8-4df0-900e-0c1362deddea, 0x4, 0x100407, 0x4) failed: Ung?ltige Adresse
qemu-system-x86_64: vfio_pci_write_config(9085644c-8cf8-4df0-900e-0c1362deddea, 0x4, 0x100407, 0x4) failed: Ung?ltige Adresse
qemu-system-x86_64: vfio_region_write(9085644c-8cf8-4df0-900e-0c1362deddea:region0+0x2080, 0x402a4000,4) failed: Ung?ltige Adresse
qemu-system-x86_64: vfio_region_write(9085644c-8cf8-4df0-900e-0c1362deddea:region0+0x12080, 0x402a5000,4) failed: Ung?ltige Adresse
qemu-system-x86_64: vfio_region_write(9085644c-8cf8-4df0-900e-0c1362deddea:region0+0x22080, 0x402a6000,4) failed: Ung?ltige Adresse
qemu-system-x86_64: vfio_region_write(9085644c-8cf8-4df0-900e-0c1362deddea:region0+0x1a080, 0x402a7000,4) failed: Ung?ltige Adresse
qemu-system-x86_64: vfio_region_write(9085644c-8cf8-4df0-900e-0c1362deddea:region0+0x2230, 0x402a9119,4) failed: Ung?ltige Adresse 

It is like as if the apple driver attempts to write 4 bytes as PCI command, instead of 2. 0x4, 0x100407, 0x4 => 0x4 is the offset, which should be 0x4 indeed to indicate a PCI command, 0x100407 seem to be the offending bytes, and the last 0x4 is the byte count (which is expected to be 2 here).. Hmmmm what on earth does the apple driver attempt here....

dmesg on the linux host shows following : dmesg_linux_host.txt

I have been using this rom as help against qemu having issues with GVT-g and UEFI... https://github.com/patmagauran/i915ovmfPkg ( i am also in contact with the author of this project )

And if i do not fake the iGPU device id, with this rom the boot succeeds but the apple driver ignores that ("cannot be registered with Framebuffer driver") and no QE / CL (quartz extreme) enabled, VRAM shows as 0 Bytes etc etc.

So i assume the apple driver must interact with the passed thru gpu properly to get this to work properly. And with the force wake workaround disabled, it output a lot of RenderXXXX messages and incremented a counter ( i will try to reproduce this)

I also took a look into the i915 linux source code to investigate those qemu error messages further. I assume something goes bonkers with the pci configuration, and later on the wrong mocs tables or so or its base adresses are being used. Offsets seem correct, but the base is not.

vit9696 commented 2 years ago

First off, how can i obtain those mac OS boot logs in text form ? I assume its not good to use screenshots for that, but atm I dont know a good alternative.

Use serial emulation. Add debug=8 to boot-args and perhaps serial=3. Then connect a serial device to QEMU, I often use -serial stdio, but it can be a file or a TCP port.

You do not need force wake on guest IGPU at all, by the way. https://patchwork.kernel.org/project/intel-gfx/patch/1485324296-14995-2-git-send-email-weinan.z.li@intel.com/

scorpion81 commented 2 years ago

Btw, here is the boot log in text form, i used A53E0000 as device-id here bootlog.txt The i915 Message: efi_main(939)Driver starts! and subsequent lines originate from the i915ovmf.rom file which I passed to QEMU cmdline as part of my launcher script as follows:

-device vfio-pci,sysfsdev=/sys/bus/mdev/devices/9085644c-8cf8-4df0-900e-0c1362deddea,bus=pcie.0,addr=0x2,driver=vfio-pci-nohotplug,display=on,romfile=/home/xxx/Projekte/gpu_passthru/i915ovmf.rom 
scorpion81 commented 2 years ago

Hello, some more investigation results:

If you just use the vbios_gvt_uefi.rom, the boot process will crash when trying to force wake the GPU (according to the apple crash log) and at the same time dmesg will report that GVT-g is not supported (because the GVT-g magic is not at register 0x78000 aka PV_INFO)

The latter is being set up by i915ovmfPkg. BUT... there is still something missing. I assume some base address still is mislocated. For exampe the vfio_pci_write_config error at offset 0x4, with count 0x4 is being interpreted as a write access to the PCI Command register. But if that really was such an access the count must be maximal 2. So i think it is a memory access. But the value (0x100407) Indeed is being found already in the PCI Command register. I assume since it tries to WRITE it again, it must have been read before somewhere and being misinterpreted as some value from a memory access. I wonder what the apple driver attempted there. Perhaps the mapping from virtual gpu to host gpu mmio base / memory ballooning etc is all not yet set up correctly by i915ovmfPkg.

The vfio_region writes try to access the Ring buffer base registers, but they again hit some nonpriv / non whitelisted ones. I assume because some MMIO base is still incorrect, since 24d8, 24dc are offsets only. The ring is the RCS0 one. And there is more with the mm type, guest context LRCA and such (I am still investigating how this is supposed to work) :slightly_smiling_face:

I mean if you do not fake the device-id with your UHD 620 being a kaby lake GPU aka A53E0000, the apple driver stays inactive and your vm will boot successfully. But you dont get hardware acceleration then. If you fake it, the Apple driver initializes but struggles with an "unexpected" virtual gpu setup i guess. So there is some layer missing which makes the apple driver believe everything is OK, we have a physical GPU here while in truth this is a virtual one. (with some address remapping)

scorpion81 commented 2 years ago

So, i was up to see whether i can get this fixed via my own framebuffer and graphics kext patching IGFX submodule. But i am unable to even read out a MMIO register to begin with.

void IGFX::GVTGAwareMaker::processFramebufferKext(KernelPatcher &patcher, size_t index, mach_vm_address_t address, size_t size) {

    uint64_t gvtg_magic = callbackIGFX->readRegister32(callbackIGFX->defaultController(), VGT_PVINFO_PAGE);

This is a snippet of a IGFX submodule i started working on. But it seems i cannot yet do readRegister32 with the defaultController() at this stage. I dont quite understand why the defaultController() is not initialized yet there. I get a panic pointing to this function call then. VGT_PVINFO_PAGE is just 0x78000 and the base of the virtual GPU memory setup (afaik). I also enabled MMIO read access and global framebuffer access support with the according flags. On request I can post more of my code.

tak2hu commented 2 years ago

@scorpion81 Hi, did you managed to boot OSX on KVM with GVT-g?

DocMAX commented 4 months ago

Same question