QubesOS / qubes-issues

The Qubes OS Project issue tracker
https://www.qubes-os.org/doc/issue-tracking/
532 stars 46 forks source link

(Re-)enable IOMMU for Intel GPU #2841

Open marmarek opened 7 years ago

marmarek commented 7 years ago

Drop iommu=no-igfx option. See https://github.com/QubesOS/qubes-issues/issues/2836#issuecomment-305780009

rustybird commented 7 years ago

Weird: Booting the 20170718 R4.0 prerelease on a T420 (latest proprietary BIOS) with iommu=no-igfx results in a Xen panic - I see "BIOS did not enable IGD for VT properly, crash Xen for security purpose" after adding console=vga. But iommu=force appears to works fine!

marmarek commented 7 years ago

On Wed, Jul 26, 2017 at 01:28:26PM -0700, Rusty Bird wrote:

Weird: Booting the 20170718 R4.0 prerelease on a T420 (latest proprietary BIOS) with iommu=no-igfx results in a Xen panic - I see "BIOS did not enable IGD for VT properly, crash Xen for security purpose" after adding console=vga. But iommu=force appears to works fine!

That's indeed interesting behaviour. Glad it works with IOMMU for you!

-- Best Regards, Marek Marczykowski-Górecki Invisible Things Lab A: Because it messes up the order in which people normally read text. Q: Why is top-posting such a bad thing?

rustybird commented 7 years ago

Seems to be this:

I'll submit a patch to xen-devel.

mex20 commented 6 years ago

This issue also occurs on Microsoft UEFI devices. I had a heck of a time figuring out why 4.0 RC1 would get stuck in a boot loop after displaying the loading Xen 4.8.1 text on the Surface boot screen. Changing “iommu=no-igfx” to “iommu=on” fixed the boot loop.

josephcsible commented 6 years ago

With Qubes OS 4.0RC4 in UEFI mode on Intel HD Graphics 4000, I needed to add iommu=no-igfx to not just get a black screen. This wasn't a problem until I upgraded from RC3.

RooneyMcNibNug commented 6 years ago

I'm possibly having the same issue since 4.0 on a T420 with regular graphics. Installation went fine (tested with multiple installation mediums, including two different USBs and a CD), but bootup shows "squashed" text and then gets stuck at a blank screen where I should be seeing the LVM password field.

The "squashed" bootup I'm seeing is almost exactly the same as the one shown in this post: https://www.reddit.com/r/Qubes/comments/7wheqk/does_anyone_else_get_this_weird_screen_glitch/

tasket commented 6 years ago

I had to remove iommu=no-igfx from Xen options after configuring anti-evil-maid. Otherwise the AEM bootup resulted in unusable garbage screen and i915 "GPU hang" errors. This is on an Ivy Bridge laptop with integrated graphics.

Geblaat commented 5 years ago

I also had to remove iommu=no-igfx to get AEM working. However, it still booted fine otherwise, only got an error from AEM(sorry forgot the exact error.) The strange thing is that when I add iommu=no-igfx again, after booting from Grub, as soon as the GUI should appear, I get a striped garbage screen. Changing to text mode show an error about crashed GPU. I'm able to continue booting and AEM is also working but the garbage screen reappears when the GUI comes back. This is on 4.0.1-RC2(Haven't tried 4.0) with a Sandy Bridge laptop with integrated graphics.

DemiMarie commented 5 years ago

I removed this option from xen.cfg and have no problems. Kernel version is 5.0.7.

donob4n commented 5 years ago

Hi, I'm testing with kernel 5.0.7 I get tons of:

[  240.854440] [drm:gen8_de_irq_handler [i915]] *ERROR* Fault errors on pipe A: 0x00000080
[  240.871072] [drm:gen8_de_irq_handler [i915]] *ERROR* Fault errors on pipe A: 0x00000080
[  240.871098] [drm:gen8_de_irq_handler [i915]] *ERROR* Fault errors on pipe A: 0x00000080
[  240.887879] [drm:gen8_de_irq_handler [i915]] *ERROR* Fault errors on pipe A: 0x00000080
[  240.887911] [drm:gen8_de_irq_handler [i915]] *ERROR* Fault errors on pipe A: 0x00000080

It boots fine but freezes after a while.

icequbes1 commented 3 years ago

I'm getting a gradual slowdown of the whole machine after reboots on a t460, stock BIOS, with intel i915 booting UEFI via grub2. Have not seen a GPU hang error yet, but perhaps I haven't waited long enough.

Qubes 4.1 dom0 fully up to date with current-testing, linux kernel 5.4.67-1 (also seen on 5.4.64, 5.4.61), Xen 4.14.0-4.

Did not appear to have an issue with gradual slowdown on the QA image from 20200914, which used Xen 4.13 and kernel 5.4.61. I will try to verify (if only I could easily downgrade from Xen 4.14 in current-testing to Xen 4.13...dnf/qubes-dom0-update laughs at me).

With no iommu Xen option, there are a lot of journal logs as such (3-5 per second):

kernel: [drm:gen8_de_irq_handler.isra.0 [i915]] *ERROR* Fault errors on pipe A: 0x00000080

and these errors in hypervisor dmesg:

[VT-D]DMAR:[DMA Read] Request device [0000:00:02.0] fault addr 29375f2000, iommu reg = ffff82c0009f2000
[VT-D]DMAR: reason 06 - PTE Read access is not set

With iommu=no-igfx, no journal logs as above are shown, but the system continues to gradually slow down, to the point where it takes almost 4-5 seconds after clicking on the 'Q' Applications menu to even render, and bootup of qubes take at least a minute.

When rebooting, the grub graphical display noticeably is less responsive to choosing the boot option. Turning power fully off and powering on, the grub graphical display regains responsiveness.

marmarek commented 3 years ago

Seems to be very old, never fixed i915 driver issue: https://gitlab.freedesktop.org/drm/intel/-/issues/22

icequbes1 commented 3 years ago

After further testing:

  1. My lag/slowdown issues do not appear to be (directly?) related to this issue. I recognized that performance was only poor when not charging the T460. If I disable Intel SpeedStep in the BIOS, there is no extremely laggy performance. This means starting sys-net takes 15 seconds compared to about 60.

  2. With SpeedStep disabled, I still encounter the fault error logs spewed by the i915 driver. I also continue to receive VT-D DMAR errors for the video PCI device in hypervisor log. Enabling iommu=no-igfx gets rid of those errors, and as of yet, there doesn't appear to be any impact on the system overall...though I guess this means no GUI domain for me in R4.1.

DemiMarie commented 3 years ago

I have turned off iommu=no-igfx and have had no problems.

icequbes1 commented 3 years ago

Same hardware @DemiMarie?

Just checked again by temporarily dropping iommu=no-igfx, I'm still getting:

These spam logs stop if I switched to a separate console tty.

R4.1, xen-4.14.0-7, kernel 5.8.16-1

As the slowdown I previously observed was due to a separate issue, I'm unsure if I have a problem aside from the log spam and systemd-journal sitting stably at 5% CPU utilization in dom0. I guess I can give sys-gui a try.

DemiMarie commented 3 years ago

@icequbes1 no, Lenovo ThinkPad P51

DemiMarie commented 3 years ago

Is there anyone here whose system is broken without iommu=no-igfx?

Given how hardware-specific this bug is, I wonder if the problem is actually in the firmware. The Intel GPUs should not be that different.

donob4n commented 3 years ago

Is there anyone here whose system is broken without iommu=no-igfx?

I removed it some weeks ago and no problems so far. (Running i5-8265U)

DemiMarie commented 3 years ago

I wonder if this is the cause of the hangs I keep having.

hbswn commented 3 years ago

The Xen 4.8 Hypervisor Command Line Options document already states:

If adding no-igfx fixes anything, you should file a bug reporting the problem.

Xen 4.8 is the version used by the stable Qubes 4.0.4.

I've removed it and it did not make any difference on my Intel NUC10i3FNK.

But I need a patch on 4.0.4 to prevent [Xen causes kernel panic during install - USB boot

5374](https://github.com/QubesOS/qubes-issues/issues/5374#issuecomment-813376040) so I'm not sure how relevant this observation is.

hbswn commented 3 years ago

Another Linux kernel option, at least for Intel hardware, is intel_iommu=on

On my NUC10i3FNK it adds one log line to my dmesg output:

DMAR: IOMMU enabled

The Linux kernel documentation on IOMMU says:

DMAR = DMA remapping