T-vK / MobilePassThrough

Make GPU passthrough on notebooks easy and accessible!
172 stars 20 forks source link

Asus GL552VW #22

Open rob-mur opened 3 years ago

rob-mur commented 3 years ago

Hi there

First - thanks so much for this repo, it's super useful to have all this information in one place! Below is a description of what happened as I tried to get this running, hopefully it will help! Secondly, apologies in advance for the long post, I’ve tried to include everything I ran into in order to find areas where the documentation could be improved. I've had a go at using it on a clean install of Fedora 35 with an Asus GL552VW. That is an intel i7-6700HQ and an Nvidia 960M. I'd recommend either pulling the windows-unattended-install branch into main OR adding a big flag on the master readme that this is the branch to currently use, as the master branch currently doesn't work (as referenced in other issues).

In order to get my dGPU to appear in the check script I had to set kernel parameter intel_iommu=on. It would be handy to add this to the readme (both that it needs to be done, and how to set a kernel parameter for newcomers).

I then ran into an issue where spice would give a connection refused, leading to a system hang as you don't start the vm quick enough because the default port in the config script had changed to 5904, but the docs in master still say 5900. I then realised that this is again just due to docs, as I had the git page on master still (my fault). I then ran "auto" as is referenced in the unattended-win-install branch readme. This failed because SERVICE_NAME was ambiguous and so the service to run on next boot wasn't set - this error occured in /scripts/utils/manager-specific/service/systemd. I tracked this down to line 187 of setup.sh where the command line arguments were in the wrong order at least for my setup. I switched them around and reran but the service is just set to run auto again so nothing happened. From here I then attempted to run the stages individually to try and work out what was happening. In this case the main issue was that the install was failing because the default vm-file names (such as the windows 10 iso), don't match the file names as downloaded by default in the setup script. This means that the install script doesn't work, but still unloads your dGPU leading to a crash. It was very helpful to have the dry-run parameter to work this out so thanks for that! It would be helpful if the script detected a failed install and didn't unload the gpu however. The first reason I could see for a failure of the install script was qemu-img not found. I then installed that package separately, perhaps an early step failed to grab it. The next error became “failed to connect to the hypervisor” from qemu. What seemed to work is installing libvirt from Discover and then importantly rebooting. This is where I’ve stopped for now as I’m on the edges of what I know without significant further reading. The bottom line is I can’t create a vGPU due to a linux error. The remaining errors that cause the install to fail can be found in the output of my install dry run below. Any pointers of next steps to troubleshoot would be appreciated! I will try and investigate this further when I get more time.

Action: install
> Start mode: qemu
> Using network mode bridged...
> Using MAC address: 52:54:BE:EF:C0:0E...
> Using 7 CPU cores...
> Using 5G of RAM...
> Using a virtual OS drive...
> Removing old virtual disk...
> Creating a virtual disk for the VM...
> Virtual OS drive has 40G of storage.
> Bumblebee is not available...
> Not using SMB share...
> Using dGPU passthrough... 
> dGPU is: 3D controller: NVIDIA Corporation GM107M [GeForce GTX 960M] (rev a2)
> Retrieving and parsing DGPU IDs...
> Not using DGPU vBIOS override...
modprobe: ERROR: could not insert 'kvmgt': No such device
modprobe: FATAL: Module vfio-mdev not found in directory /lib/modules/5.14.18-300.fc35.x86_64
> Creating a vGPU for mediated iGPU passthrough...
bash: line 1: /sys/bus/pci/devices/0000:00:02.0/mdev_supported_types/*/create: No such file or directory
> [Error] Failed creating a vGPU. (You can try again. If you still get this error, you have to reboot. This seems to be a bug in Linux.)
> Continuing without mediated iGPU passthrough...
> Loading display-mode-4 plugin...
> Using spice on port 5904...
> Not using QXL...
> Not using Looking Glass...
> Using fake battery...
> Creating fresh OVMF_VARS copy for this VM...
> Not using patched OVMF...
> Not using USB passthrough...
> Using virtual input method 'virtio' for keyboard/mouse input...
> Using RDP...
> Deleting VM if it already exists...
> [Background task] Starting RDP autoconnect...
> Generating qemu-system-x86_64 command (dry-run)...

sudo qemu-system-x86_64 \
  -name MBPT_WindowsVM \
  -machine type=q35,accel=kvm \
  -global ICH9-LPC.disable_s3=1 \
  -global ICH9-LPC.disable_s4=1 \
  -enable-kvm \
  -cpu host,kvm=off,hv_vapic,hv_relaxed,hv_spinlocks=0x1fff,hv_time,hv_vendor_id=12alphanum \
  -mem-prealloc \
  -rtc clock=host,base=localtime \
  -nographic \
  -serial none \
  -parallel none \
  -boot menu=on \
  -boot once=d \
  -k en-us \
  -device ich9-intel-hda \
  -device hda-output \
  -device pci-bridge,addr=12.0,chassis_nr=2,id=head.2 \
  -net nic,model=e1000,macaddr=52:54:BE:EF:C0:0E \
  -net bridge,br=virbr0 \
  -smp 7 \
  -m 5G \
  -drive file=/home/rmurphy/MobilePassThrough/vm-files/windows10.iso,index=1,media=cdrom \
  -drive file=/home/rmurphy/MobilePassThrough/vm-files/mobile-passthrough-helper.iso,index=2,media=cdrom \
  -drive id=disk0,if=virtio,cache.direct=on,if=virtio,aio=native,format=raw,file=/home/rmurphy/MobilePassThrough/vm-files/MBPT_WindowsVM.img \
  -device ioh3420,bus=pcie.0,addr=1c.0,multifunction=on,port=1,chassis=1,id=pci.1 \
  -device vfio-pci,host=01:00.0,bus=pci.1,addr=00.0,x-pci-sub-device-id=0x1c5d,x-pci-sub-vendor-id=0x1043,multifunction=on,rombar=0 \
  -spice port=5904,addr=127.0.0.1,disable-ticketing \
  -acpitable file=/home/rmurphy/MobilePassThrough/vm-files/fake-battery.aml \
  -drive if=pflash,format=raw,readonly=on,file=/usr/share/OVMF/OVMF_CODE.fd \
  -drive if=pflash,format=raw,file=/home/rmurphy/MobilePassThrough/vm-files/OVMF_VARS_VM.fd \
  -usb \
  -device virtio-keyboard-pci,bus=head.2,addr=03.0,display=video.2 \
  -device virtio-mouse-pci,bus=head.2,addr=04.0,display=video.2

Cleaning up...
> Unbinding device ' from the vfio-pci driver, then bind it back to its original driver...
driverctl: no such device: 0000:
> [Error] Seems like the installation failed...
rob-mur commented 3 years ago

Just to add to this, the first error here (kvmgt) was down to the kernel parameters not being persisted. My gut feeling without digging into it is that update-grub has been used in the script which doesn't work on fedora, you need

grub2-mkconfig -o "$(readlink -e /etc/grub2.conf)"

This still leaves the fatal vfio-mdev which I'll keep looking into.

rob-mur commented 3 years ago

After doing the above grub update with a reboot I then get a different set of compatibility information, see below:

[OK] The IOMMU kernel parameters are set.
[OK] VT-X / AMD-V virtualization is enabled in the UEFI.
[OK] VT-D / IOMMU is enabled in the UEFI.
[Success] GPU with ID '01:00.0' could be passed through to a virtual machine!
[Problem] Other devices have been found in the IOMMU group of the GPU with the ID '00:02.0'. Depending on the devices, this could make it impossible to pass this GPU through to a virtual machine!
The devices found in this GPU's IOMMU Group are:
IOMMU Group 10 00:1f.2 Memory controller [0580]: Intel Corporation 100 Series/C230 Series Chipset Family Power Management Controller [8086:a121] (rev 31)
IOMMU Group 12 03:00.0 Unassigned class [ff00]: Realtek Semiconductor Co., Ltd. RTL8411B PCI Express Card Reader [10ec:5287] (rev 01)
IOMMU Group 12 03:00.1 Ethernet controller [0200]: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller [10ec:8168] (rev 12)
IOMMU Group 4 00:14.2 Signal processing controller [1180]: Intel Corporation 100 Series/C230 Series Chipset Family Thermal Subsystem [8086:a131] (rev 31)
[Info] It might be possible to get it to work by putting the devices in different slots on the motherboard and/or by using the ACS override patch. Otherwise you'll probably have to get a different motherboard. If you're on a laptop, there is nothing you can do as far as I'm aware. Although it would theoretically be possible for ACS support for laptops to exist. TODO: Find a way to check if the current machine has support for that.
[Success] There are 1 GPU(s) in this system that could be passed through to a VM!

Is Compatible?  Name                       IOMMU_GROUP  PCI Address
No              HD Graphics 530            13
2               pci@0000:00:02.0
Yes             GM107M [GeForce GTX 960M]  1            pci@0000:01:00.0

[OK] You have GPUs that are not in the same IOMMU group. At least one of these could be passed through to a VM and at least one of the remaining ones could be used for the host system.
[Info] This system is probably MUX-less. (The connection between the GPU(s) and the [internal display]/[display outputs] is not multiplexed.)
If you found a notebook that appears to be GPU passthrough compatible, please open an issue on Github and let me know.
You may now proceed and run './mbpt.sh configure' if you haven't already.

I'm reading this as "only the dGPU can be passed to a VM, not the integrated". So I changed to config to put false for passing through the iGPU but unfortunately this doesn't improve the situation.

rob-mur commented 3 years ago

After some further digging, I think it's because the helper iso hasn't been generated. Looking at the documentation this should be with the iso argument but then looking at the actual mbpt.sh file I think you need to write helper-iso. Running with helper-iso generates the required file and clears this specific error.

rob-mur commented 3 years ago

A further test I've perfomed is removing the vfio relevant commands from qemu and checking windows would boot if given the chance, which it does (I can see it with spice).

This indicates that the issue is more than disabling the dGPU in the host bricks fedora rather than it not actually being able to passthrough. Will investigate later.

rob-mur commented 3 years ago

After narrowing down the hang it's on driverctl set-override. Running driverctl list-overrides returns

driverctl: No overridable devices found. Kernel too old?

Unsure if this is just a fedora 35 issue, but will keep investigating when I can.

rob-mur commented 3 years ago

Got a little further by installing the proprietary Nvidia drivers but that just opens an extra can of worms with the system not booting, and bumblebee setup being dodgy (as you well mention in the guide!)

I think I'm done trying to get this to work on Fedora and chalk this up as a loss unless anyone has a good idea for a next troubleshooting step.

T-vK commented 3 years ago

Hey, thanks for the feedback. I agree that I really need to merge the windows-unattended-install into the master. Considering that the master branch has so many issues for Ubuntu users there isn't really a good reason not to do it anymore. intel_iommu=on should automatically be set by mbpt.sh when running during the setup (i.e. when running mbpt.sh setup). It does require a reboot though.

It's possible that some adjustments have to be made in order to support Fedora 35. I have only tested that branch on Fedora 34 so far. If that still doesn't work for you and you have a 64GB USB stick, it might be worth it to try the Live CD image which you can build by running ./mbpt.sh live build.

The helper ISO should be generated automatically whenever necessary. This happens during in the setup (mbpt.sh setup) as well.

https://github.com/T-vK/MobilePassThrough/blob/4a1627a69b7e1c50aeda7a49823adf5376eca437/scripts/main/setup.sh#L164

It shouldn't be necessary to install a specific Nvidia driver. If you have either Nouveau or the proprietary one installed you should be fine. Bumblebee is completely untested in regards to the new branch, I advise against setting it up before getting it to work without it.

In regards to driverctl I'm not sure. Often times the only way to get a GPU (driver) to behave normally again is a reboot. There still are a lot of bugs in Linux or in the GPU drivers (this includes Intel, AMD, Nvidia and Nouveau) causing all sorts of weird behavior when binding or unbinding a driver.

rob-mur commented 3 years ago

Thanks for the response!

I have a feeling unbinding the driver just isn't happy on Fedora 35 yet. Under the nouveau driver I got a full hang, and under the proprietary driver I got a shell hang. Other people have experienced something similar (although not Fedora 35 specifically) https://forum.level1techs.com/t/problem-cant-use-driverctl-overrides-on-nvidia-driver/176777

I think what I will do is use the live environment and see if I can get that working. If so I think it would probably be a good idea for those to be the main focus of what's provided? That way it's always a clean install and all variables are controlled.

T-vK commented 3 years ago

The live version isn't really meant to be installed though. Its main purpose is to provide a fully automated fool-proof way of testing if GPU passthrough works on a given device or not. Theoretically it shouldn't make a difference if you use the live image or a fresh install of Fedora 34. The goal is to eventually support most distros or at least to make it easy to add support for new distros.

rob-mur commented 3 years ago

That makes sense. Assuming I get it working on Fedora 34 (both live and fresh), do you have a priority list in mind for which distros you'd like to support? I'm happy to just throw stuff at this laptop for testing.

T-vK commented 3 years ago

I think it would be Fedora 34, Fedora 35 and then Ubuntu 21. I can't give you a date on when all this will happen though.

rob-mur commented 3 years ago

Sounds good. Currently running into issues just building the live iso as there are python modules not installed that are required. Namely it seems to assume you have pip and then also the errors package. Potentially need to look into probing for this ahead of time/ensuring there's a venv setup for python, but in the first instance I'm just seeing if I can install the missing packages.

Full trace if you like

 [Skipped] Executable dependencies are already installed.
> Downlaoding and installing livecd-tools...
Cloning into 'livecd-tools'...
remote: Enumerating objects: 35, done.
remote: Counting objects: 100% (35/35), done.
remote: Compressing objects: 100% (33/33), done.
remote: Total 35 (delta 2), reused 17 (delta 1), pack-reused 0
Receiving objects: 100% (35/35), 170.84 KiB | 3.22 MiB/s, done.
Resolving deltas: 100% (2/2), done.
<string>:1: DeprecationWarning: The distutils package is deprecated and slated for removal in Python 3.12. Use setuptools or check PEP 632 for potential alternatives
<string>:1: DeprecationWarning: The distutils.sysconfig module is deprecated, use sysconfig instead
pod2man --section=8 --release="livecd-tools 28.3" --center "LiveCD Tools" docs/livecd-creator.pod > docs/livecd-creator.8
pod2man --section=8 --release="livecd-tools 28.3" --center "LiveCD Tools" docs/livecd-iso-to-disk.pod > docs/livecd-iso-to-disk.8
/usr/bin/install -c -D tools/livecd-creator /usr/bin/livecd-creator
ln -sf livecd-creator /usr/bin/image-creator
/usr/bin/install -c -D tools/liveimage-mount /usr/bin/liveimage-mount
/usr/bin/install -c -D tools/livecd-iso-to-disk.sh /usr/bin/livecd-iso-to-disk
/usr/bin/install -c -D tools/livecd-iso-to-pxeboot.sh /usr/bin/livecd-iso-to-pxeboot
/usr/bin/install -c -D tools/editliveos /usr/bin/editliveos
/usr/bin/install -c -D tools/mkbiarch /usr/bin/mkbiarch
/usr/bin/install -c -m 644 -D AUTHORS /usr/share/doc/livecd-tools/AUTHORS
/usr/bin/install -c -m 644 -D COPYING /usr/share/doc/livecd-tools/COPYING
/usr/bin/install -c -m 644 -D README /usr/share/doc/livecd-tools/README
/usr/bin/install -c -m 644 -D HACKING /usr/share/doc/livecd-tools/HACKING
mkdir -p /usr/share/livecd-tools/
mkdir -p //usr/lib/python3.10/site-packages/imgcreate
/usr/bin/install -c -m 644 -D imgcreate/*.py //usr/lib/python3.10/site-packages/imgcreate/
/usr/bin/python -c "import compileall as c; c.compile_dir('//usr/lib/python3.10/site-packages/imgcreate', force=1)"
Listing '//usr/lib/python3.10/site-packages/imgcreate'...
Compiling '//usr/lib/python3.10/site-packages/imgcreate/__init__.py'...
Compiling '//usr/lib/python3.10/site-packages/imgcreate/creator.py'...
Compiling '//usr/lib/python3.10/site-packages/imgcreate/debug.py'...
Compiling '//usr/lib/python3.10/site-packages/imgcreate/dnfinst.py'...
Compiling '//usr/lib/python3.10/site-packages/imgcreate/errors.py'...
Compiling '//usr/lib/python3.10/site-packages/imgcreate/fs.py'...
Compiling '//usr/lib/python3.10/site-packages/imgcreate/kickstart.py'...
Compiling '//usr/lib/python3.10/site-packages/imgcreate/live.py'...
Compiling '//usr/lib/python3.10/site-packages/imgcreate/util.py'...
/usr/bin/python -O -c "import compileall as c; c.compile_dir('//usr/lib/python3.10/site-packages/imgcreate', force=1)"
Listing '//usr/lib/python3.10/site-packages/imgcreate'...
Compiling '//usr/lib/python3.10/site-packages/imgcreate/__init__.py'...
Compiling '//usr/lib/python3.10/site-packages/imgcreate/creator.py'...
Compiling '//usr/lib/python3.10/site-packages/imgcreate/debug.py'...
Compiling '//usr/lib/python3.10/site-packages/imgcreate/dnfinst.py'...
Compiling '//usr/lib/python3.10/site-packages/imgcreate/errors.py'...
Compiling '//usr/lib/python3.10/site-packages/imgcreate/fs.py'...
Compiling '//usr/lib/python3.10/site-packages/imgcreate/kickstart.py'...
Compiling '//usr/lib/python3.10/site-packages/imgcreate/live.py'...
Compiling '//usr/lib/python3.10/site-packages/imgcreate/util.py'...
mkdir -p /usr/share/man/man8
/usr/bin/install -c -m 644 -D docs/*.8 /usr/share/man/man8
/usr/bin/sed -i "s:#!/usr/bin/python:#!/usr/bin/python:g" /usr/bin/livecd-creator
/usr/bin/sed -i "s:#!/usr/bin/python:#!/usr/bin/python:g" /usr/bin/liveimage-mount
/usr/bin/sed -i "s:#!/usr/bin/python:#!/usr/bin/python:g" /usr/bin/editliveos
/usr/bin/sed -i "s:#!/usr/bin/python:#!/usr/bin/python:g" /usr/bin/mkbiarch
sudo: pip3: command not found
> Downloading Fedora ISO...
--2021-11-24 14:48:41--  https://download.fedoraproject.org/pub/fedora/linux/releases/34/Workstation/x86_64/iso/Fedora-Workstation-Live-x86_64-34-1.2.iso
Resolving download.fedoraproject.org (download.fedoraproject.org)... 2a05:d01c:c6a:cc01:269:da52:9ae1:43e6, 2a05:d014:10:7803:f774:4d7c:e277:a457, 2001:4178:2:1269::fed2, ...
Connecting to download.fedoraproject.org (download.fedoraproject.org)|2a05:d01c:c6a:cc01:269:da52:9ae1:43e6|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://www.mirrorservice.org/sites/dl.fedoraproject.org/pub/fedora/linux/releases/34/Workstation/x86_64/iso/Fedora-Workstation-Live-x86_64-34-1.2.iso [following]
--2021-11-24 14:48:41--  https://www.mirrorservice.org/sites/dl.fedoraproject.org/pub/fedora/linux/releases/34/Workstation/x86_64/iso/Fedora-Workstation-Live-x86_64-34-1.2.iso
Resolving www.mirrorservice.org (www.mirrorservice.org)... 2001:630:341:12::184, 212.219.56.184
Connecting to www.mirrorservice.org (www.mirrorservice.org)|2001:630:341:12::184|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 2007367680 (1.9G) [application/x-iso9660-image]
Saving to: ‘/home/rmurphy/MobilePassThrough/live-iso-files/Fedora-Workstation-Live-x86_64-34-1.2.iso.part’

/home/rmurphy/MobilePassThrough/live-iso-files/ 100%[=======================================================================================================>]   1.87G  8.14MB/s    in 4m 23s  

2021-11-24 14:53:04 (7.29 MB/s) - ‘/home/rmurphy/MobilePassThrough/live-iso-files/Fedora-Workstation-Live-x86_64-34-1.2.iso.part’ saved [2007367680/2007367680]

> Rebuilding the ISO adding kernel parameters and some files...
Traceback (most recent call last):
  File "/usr/bin/editliveos", line 74, in <module>
    from imgcreate.fs import *
  File "/usr/lib/python3.10/site-packages/imgcreate/__init__.py", line 19, in <module>
    from imgcreate.live import *
  File "/usr/lib/python3.10/site-packages/imgcreate/live.py", line 35, in <module>
    from imgcreate.fs import *
  File "/usr/lib/python3.10/site-packages/imgcreate/fs.py", line 35, in <module>
    from imgcreate.util import *
  File "/usr/lib/python3.10/site-packages/imgcreate/util.py", line 23, in <module>
    from errors import *
ModuleNotFoundError: No module named 'errors'
mv: cannot stat '/home/rmurphy/MobilePassThrough/live-iso-files/mbpt-*.iso': No such file or directory

(btw I'm doing this on a fresh install of 35 as I had the live usb ready to go, so my previous work isn't causing these errors)

rob-mur commented 3 years ago

If I install python-imgcreate https://yum-info.contradodigital.com/view-package/epel/python-imgcreate/ then the error simplifies a bit, but still unclear what to do.

As a next troubleshooting step I'm going to fresh install Fedora 34 to isolate 35 as a source of error. Final trace below:

> [Skipped] Executable dependencies are already installed.
> [Skipped] livecd-tools already installed.
> [Skipped] Fedora ISO already downloaded.
> Rebuilding the ISO adding kernel parameters and some files...

Source image at '/home/rmurphy/MobilePassThrough/live-iso-files/Fedora-Workstation-Live-x86_64-34-1.2.iso'

LiveOS edit has ended.
Process duration: 00:00:02
Traceback (most recent call last):
  File "/usr/bin/editliveos", line 2729, in <module>
    sys.exit(main())
  File "/usr/bin/editliveos", line 2601, in main
    editor._pre_mount(args.liveos, args.rootfsimg, args.overlay)
  File "/usr/bin/editliveos", line 745, in _pre_mount
    self._LoopImageCreator__fstype = losm.imgloop.fstype
AttributeError: 'LoopbackDisk' object has no attribute 'fstype'
mv: cannot stat '/home/rmurphy/MobilePassThrough/live-iso-files/mbpt-*.iso': No such file or directory
T-vK commented 3 years ago

Seems like pip3 is missing in the dependencies.sh.

https://github.com/T-vK/MobilePassThrough/blob/1527df233238bb4fba229baef837f3d53fcf9164/requirements.sh#L64

I'm not exactly sure about that AttributeError. Seems like an issue with editliveos. Not sure why you're getting it. It might be worth trying to delete the livecd-tools directory from the thirdparty directory, so that the generate-live-iso.sh will try to reinstall it. Maybe the installation wasn't completed because of the missing pip3 and now it thinks the installation is complete just because it sees the livecd-tools directory.

rob-mur commented 3 years ago

So I'm now on a fresh Fedora 34 install, which runs into the same issue.

I then added pip3 as you instructed and deleted the livecd-tools directory. After re-running this does clear up the pip3 bug, but the module not found still remains. I haven't explicitly installed python-imgcreate this time but I imagine it would just do the same thing. Is that the correct dependency to use? If so I'll try it/dig again later on.

T-vK commented 3 years ago

pip3 should do the trick. I don't know what python-imgcreate is. I think it's possible that the latest master of the livecd-tools is buggy. (I think my script just does a git clone on it.) That could explain why it used to work and now it doesnt anymore. So switching to an older commit for livecd-tools might do the trick.

rob-mur commented 3 years ago

python-imgcreate is just something I found when trying random things that had the same command, we can ignore it.

I think you are right that the issue is with livecd-tools as if I simply run editliveos I get the same module error. I'll try with an older version as you say.

rob-mur commented 3 years ago

Was that indeed, the latest tagged branch livecd-tools-28.1 is now building the image without problem so far. I think we'd want to update the script to have $LATEST_RELEASE = git describe --tags `git rev-list --tags --max-count=1` be used to get the latest release and then git clone --depth=1 -b $LATEST_RELEASE https://github.com/livecd-tools/livecd-tools.git

I'll let you know how the live os does!

T-vK commented 3 years ago

I'd say it would probably be best to just hardcode the current release into the -b flag on the clone because who knows if their next release will actually be bug-free. I had many issues with older releases before. Glad to hear you managed to build it now.

rob-mur commented 3 years ago

Yeah that makes sense, doesn't really need to be updated.

Unfortunately, the live environment in f34 runs into similar problems as f35 did. When I next get time I'll look at doing the same fixes I did to get the f35 version almost working but in a full f34 install. Hopefully I can actually unbind the dGPU in f34.

rob-mur commented 3 years ago

I got it working on f34! Well, not really. I got the windows install to actually fire with the gpu being passed through, however it did run into error 43. My understanding from recent news was that nVidia was going to lift this specific restriction though, so assuming that gets removed it should be good to go. Seamless via RDP and Looking Glass are things I'd like to look into but will skip it for now due to the 43.

To recap, the issues with the script (ignoring live os) are: The command to re-run auto after a restart is faulty (parameters in the wrong order) You need to explicitly save changes to grub after writing kernel parameters. I then needed to install virt-manager so there's a hypervisor.

All of the above sound super fixable and could be included in the script.

T-vK commented 3 years ago

Thanks for summarizing. I'll try add those changes when I have some time.

Regarding error 43, the following options may help:

SHARE_IGPU="auto"
USE_FAKE_BATTERY="true"

And if you can get your hands on the DGPU ROM, the following may help as well:

DGPU_ROM="${VM_FILES_DIR}/vbios-roms/vbios.rom"
PATCH_OVMF_WITH_VROM="true"

Also, the Nvidia driver within the VM needs to be somewhat up to date. And if it's still not working maybe it would be worth a try to install the driver provided by Windows Update in case it does provide one.

rob-mur commented 3 years ago

Thanks!

I was using the default options which include those two so unfortunately they have already been tried. I didn't manage to get absolutely everything updated though before running out of space on the default 40G vdrive. Maybe if absolutely everything was up to date it would work so I'm trying again with 100G.

If not I will investigate trying to find the ROM as you suggest.

rob-mur commented 3 years ago

A further update. I actually think the hang I got when I unbinded the dGPU on f35 was caused by X11, as I can recreate this now on f35. On wayland there is no such issue - odd as usually wayland is the difficult one!

Being fully updated ensures that the intel iGPU gets passed through fully but I still get a 43 on the dGPU. I will take a look into getting the ROM/ any other 43 workarounds.

rob-mur commented 3 years ago

Had a go with a rom I found here tavk/ovmf-vbios-patch:edk2-stable201905

Couple things, for this section of the script to work the user must have done the initial docker setup, and pulled and built your patch repo as documented on docker hub.

Secondly the path to the rom cannot be in the root of the project directory or docker create will through a bug. Not that much of a problem just should be documented the roms need to go in another folder.

However upon running this the patcher fails to compile. I've attached the dump if it helps at all. I might keep investigating the news around 43 as I'm surprised to still see it to be honest.

debug.txt

T-vK commented 3 years ago

Is there a rom hardcoded into the Docker image or what do you mean?

In order to patch the OVMF with the rom, you just need to set PATCH_OVMF_WITH_VROM="true". The image should have been pulled during the setup stage using this script: https://github.com/T-vK/MobilePassThrough/blob/1527df233238bb4fba229baef837f3d53fcf9164/scripts/utils/common/setup/ovmf-vbios-patch-setup

It is then being used in the vm.sh script: https://github.com/T-vK/MobilePassThrough/blob/1527df233238bb4fba229baef837f3d53fcf9164/scripts/main/vm.sh#L655

The easiest way to get your hands on the vbios rom is to search for it here: https://www.techpowerup.com/vgabios/ You might also be able to acquire it by booting into Windows and dumping it using GPU-z. The Linux tools to dump the vbios roms have many issues from my experience. I've never had any success with them on notebooks.

Since I haven't been able to build the Docker image from source using the Live version (which I usually use to verify that the project in its current state works) due to complicated disk space issues, I haven't actually tested that in a while.

In your dump it says /edk2/OvmfPkg/AcpiPlatformDxe/QemuFwCfgAcpi.c:21:18: fatal error: vrom.h: No such file or directory. I think I've had that error before, but I can't remember what caused it. The source for the path is here: https://github.com/T-vK/ovmf-with-vbios-patch/blob/master/files/QemuFwCfgAcpi.c.patch

Maybe this could be caused by a faulty vbios rom. I'm not sure to be honest.

rob-mur commented 3 years ago

So I think the reason the vbios setup docker pull didn't work is that the script assumes a) docker is installed, b) docker can be ran without sudo and c) the user has logged into docker in advance so they can pull it from your hub. As I was on a fresh install this wasn't the case. Not really an issue, but perhaps something that should flag a bit louder as I hadn't noticed it had even tried.

I got the rom from the same source (techpowerup) but it's untested because they don't have an official one for the 960M. The author of the unofficial one does reference another notebook though so I think the next move is for me to install windows and then use gpu-z, hoping that as you say it's just a faulty rom rather than something wrong in the code.

rob-mur commented 3 years ago

So unfortunately neither gpu-z nor nvflash work for this laptop, the GPU bios is too integrated into the system bios.

There may be some way to do it with DOS apparently but it doesn't seem like a great use of time.

I'll keep looking around on forums and such but otherwise I think he's dead Jim.

T-vK commented 3 years ago

Okay one more thing you can try is downloading a bios update for your laptop and then use the vbiosfinder (which should be installed into the thirdparty directory) to extract the roms from it. I just remembered that this was the only way I managed to get my hands on the vbios roms for a Dell laptop. At least after adding some code to decompress Dell's proprietary Bios update format. https://github.com/T-vK/DellBiosUnpackerPOC

Docker should have been installed during the setup https://github.com/T-vK/MobilePassThrough/blob/1527df233238bb4fba229baef837f3d53fcf9164/requirements.sh#L34

Also, the scripts ru docker with sudo, so that shouldn't be the issue.

And last time I checked, you only had to log into docker hub if you wanted to push an image or if you reached their rate limit. I do however remember having a weird issue where a faulty docker installation forced me to log in in order to pull some images.

rob-mur commented 3 years ago

Yeah odd with docker then, not sure what caused the login to be required.

So I've had an initial look into for this laptop and Asus also use a proprietary method for updating bios.

(See here)

They don't provide an exe to update on their website, but a binary file which isn't the full bios. You then use their tool to complete the file when you want to flash.

For a slightly different asus situation I saw a post about trying to use a hexeditor to hack it together, however I think I'd need to by a physical tool to rip the current bios.

There is however a promising guide here which reads to me like the exact thing I want, I'll report back if it actually works.

rob-mur commented 3 years ago

With some skulduggery I now have a folder of about 1000 ROM files which have been extracted from my bios, one of which should hopefully be the file I need. Unfortunately I have no better way than to try them all so I'm going to try and write something that runs the patcher in a loop and just tries them.

T-vK commented 3 years ago

There are two more ways I can think of to get your hands on the actual bios file that gets written to the SPI flash chip. Extracting the vbios roms from that file should work fine with vbiosfinder then.

The first and simple method is to find one of the many ebay sellers that sell bios chips and kindly ask if they can sell you the raw bios file instead of flashing it on a chip and sending you that chip. One seller that I can recommend for this is that one: https://www.ebay.co.uk/usr/chips-of-tomorrow But I'm sure there are many more.

The second option is buying an SPI programmer with a clamp adapter. You can then open your notebook and attach the clamp to the SPI flash and then use the programmer connected to a second computer to dump the bios from the chip directly. I've done this before using this one: https://www.aliexpress.com/item/4001045543107.html

I've also used it in the past to flash a modified (and thus unsigned) bios image.

You can of course also combine the two methods. (Buy the chip from ebay and then dump the file using the programmer.) Or if you have a reflow station, some flux and solder (and ideally some experience) you can simply replace the SPI chip on that motherboard with the one bought from ebay.

Disclaimer: I can't guarantee that the programmer/clamp would actually fit the chip. And it might be possible for those chips to have a fuse that the manufacturer can blow to make them write-only.

One more thing: Are you booting in UEFI mode or legacy mode? I'm just asking because I've had many problems with GPU passthorugh VMs when the host was booted in legacy mode.

rob-mur commented 3 years ago

Thanks for such an indepth response! I've got the patcher churning through the roms produced by PhoenixTool for now so fingers crossed one of those will actually work.

I think I'd happily try the first two of those approaches but the third is outside of my expertise unfortunately. Time will tell how extreme I need to go! If my last ditch software solution doesn't work I'll give that ebay seller a message.

Out of curiosity, I think you said in another thread that you'd got this working with an xps-15. If so, would you mind linking me to the exact model? I've also got one which is what I'd ideally use longer term, the Asus was more of a test device for this methodology.

Oh and I'm in UEFI mode unfortunately!

T-vK commented 3 years ago

That XPS 15 was a Dell XPS 15 9575 2-in-1 notebook/tablet with one of these very rare Intel CPUs that have a built in AMD GPU. Intel i7-8705G with Intel HD Graphics 630 + Radeon RX Vega M GL (4GB)

At the time of writing I can not recommend that model because even though GPU passthrough works, you will find no information on how to work around error 43 with those weird Intel CPUs online. I think it wasn't even known that non-Nvidia GPUs could run into that error as well until I tried it with that device.

rob-mur commented 2 years ago

Interesting! The one I have has an Nvidia GPU so might be smoother, who knows.

I have both good and bad news. The good news is I actually managed to get vbiosfinder to spit out two roms (presumably one for iGPU, one for dGPU).

The bad news is that neither of them actually worked, still resulting in error 43. I'm not sure what's going on in this case because they look like valid files as the patcher actually runs successfully.

I have one further avenue before giving up though. I generated that vbios from The bios from asus' website and it's actually a newer version than what's currently running on the host. Potentially upgrading the bios on the host could help? It does seem unlikely though.

T-vK commented 2 years ago

You are correct in that the two files are for the iGPU and for the dGPU. Vbiosfinder names them vendorid:modelid (if I recall correctly). Googleing the vendor ids will tell you which one is Nvidia and which one is Intel. You should try to use both of them at the same time. I don't know much about how this vbios stuff works, but if you boot the host in legacy rather than UEFI mode, the VM will need different vbioses, so it seems like a reasonable assumption that the host uefi/bios might somehow need to match what your handing over to the VM.

Updating the UEFI often fixes a lot of low level hardware issues as well. So it would generally seem like a good idea to update it.

If you're still getting error 43 then, I would try some older versions of the Nvidia driver and also google if anyone else had success with that GPU model.

rob-mur commented 2 years ago

Some updates of what I've tried.

Using both ROMs at the same time unfortunately didn't see any improvement.

Flashing to the updated bios so the host and guest are in line made a change, but didn't resolve the 43.

To unpack that a bit, with the new bios I get a KVM crash if I patch the vbios on a fresh install. However this crash goes away if I install windows without patching and then on a subsequent start include the patch. Still doesn't fix the 43 though.

The error is similar to this forum post, but I've not seen a solution as yet.