QubesOS / qubes-issues

The Qubes OS Project issue tracker
https://www.qubes-os.org/doc/issue-tracking/
531 stars 46 forks source link

Random Crashes Associated with i3wm and Dragging Floating Windows #7902

Open BawdyAnarchist opened 1 year ago

BawdyAnarchist commented 1 year ago

How to file a helpful issue

Qubes OS release

Qubes 4.1.1
Linux 5.15.76-1.fc32 i3wm

Purism Librem 14 Model: L14v1-01

Brief summary

I have been getting regular random crashes of my system ever since I bought the laptop and installed Qubes, back in April 2022. Particularly, it tends to happen when dragging a floating window to a new location; however, it also sometimes crashes randomly when switching workspaces.

I have tried to switch to a different ttys, but it seems to be totally frozen, not just an Xorg crash.

I have also checked journalctl several times and found nothing that looked suspicious. For example, this last crash had something for qrexec-policy-daemon[2138]: qrexec:whonix.SdwdateStatus+: as the last message before freezing about 1 minute later.

Steps to reproduce

While in i3wm, press the $mod key and use the mouse to drag floating windows to new locations, will usually do it within 5-10 minutes of normal use.

Is there anything else I should check to try and pinpoint why this is happening? I have tried change out the laptop RAM. I'm not seeing any disk errors.

DemiMarie commented 1 year ago

Does the system respond to sysrq? If so, can you switch to another TTY (use SysRq-R first), then see if you can log in at the console? Also, a kernel-mode backtrace would be nice.

ghost commented 1 year ago

Have you tried switching to xorg's intel driver (like in this forum post) ? (just an idea - people mention video corruption, not crashes, but it might solve your issue).

dmoerner commented 1 year ago

Do you see this behavior on 5.10.112?

BawdyAnarchist commented 1 year ago

Does the system respond to sysrq? If so, can you switch to another TTY (use SysRq-R first), then see if you can log in at the console? Also, a kernel-mode backtrace would be nice.

I just got a freeze again while switching workspaces. I tried: ALT-SysRq-r, and wasn't able to switch TTY. I also tried ALT-Fn-SysRq-r (since my SysRq is same button as PrtScr but in the smaller letters below). Since that didn't work, I tried sysrq-B, hoping to get a reboot, but I got no response.

Reading up on sysrq (never heard of it before), I'm assuming that Qubes has sysrq already built into the kernel? Or do I have to enable it?

Also, I don't know how to do a kernel-mode backtrace. I'll try and look that up, unless you have a guide handy that you recommend.

BawdyAnarchist commented 1 year ago

Have you tried switching to xorg's intel driver (like in this forum post) ? (just an idea - people mention video corruption, not crashes, but it might solve your issue).

Yes I have it enabled, and also with: Option "TearFree" "true" . Thanks for the idea.

Do you see this behavior on 5.10.112?

I'm guessin yes? If 5.10.112 was around in April of this year, then yes.

BawdyAnarchist commented 1 year ago

@taradiddles , I was just reading through some of your recent posts about crashes ...

I'm running i3wm, and with the the same intel graphics that you're saying causes you hard freezes with no ability to run a trace.

We're on very differen HW though I believe. I'm on a i7-10710U.

DemiMarie commented 1 year ago

@BawdyAnarchist: you can enable SysRq until the next reboot with sudo sysctl kernel.sysrq=1. If you want to enable it permanently, add kernel.sysrq=1 to /etc/sysctl.conf.

BawdyAnarchist commented 1 year ago

Thank you! I permanently enabled and rebooted. I'd really like to help track down the cause of this, but I might need a bit of hand holding. I'll report back if I can switch TTY after the next freeze.

ghost commented 1 year ago

I was just reading through some of your recent posts about crashes ...

I'm running i3wm, and with the the same intel graphics that you're saying causes you hard freezes with no ability to run a trace.

There are actually quite a few issues/posts scattered around reporting crashes with intel/i915 that in hindisght seem closely related or even downright duplicates. Most people (me included) switch from fb to the intel driver because of artifacts/glitches with fb, and then experience random crashes.

As @DemiMarie suggested - then confirmed by @marmarek - artifacts/glitches/tearing/... happen with fb when there's no compositor running, for some reason that still has to be found. XFCE (and KDE?) has compositing enabled by defaut so most people don't experience that issue. i3wm depends on an external application for compositing - there's no compositing by default - so people get tearing then switch to intel, which fixes graphical issues but for unknown reasons triggers various oops/freezing/hard crashes.

So - until someone understands why intel/i915 is buggy - the solution is to revert to fb (the default AFAIK) and use compositing. i3's (old) faq mentions using compton to provide compositing, maybe that'd be the way to go for i3wm users. While this doesn't address your issue, it may be a way to get back to a normal "state" where you can work with your laptop rather than wonder when it'll crash next (though if you can, obtaining a trace could help the devs here).

BawdyAnarchist commented 1 year ago

(though if you can, obtaining a trace could help the devs here).

Since I can pretty reliable freeze the system by dragging floating windows, I can probably get a trace. I spent 15 minutes or so looking at fedora backtrace stack trace, etc, but it's not entirely clear what I need to install, or what setup/configs I need before trying to induce a freeze.

I definitely would like to get this for the devs, but I need a bit of hand holding to know what to do.

BawdyAnarchist commented 1 year ago

Update. Got another freeze just now. Was unable to sysrq-r , or sysrq-b. To confirm that sysrq is enabled/functioning, after rebooting, I ran sysrq-b to see if it would force a reboot on a normally functioning system. It did.

Still have no idea what I need to do to perform a trace.

DemiMarie commented 1 year ago

I suspect the Intel driver and the kernel’s driver are fighting. Is it possible to use a compositing manager with i3?

BawdyAnarchist commented 1 year ago

I found this article, so it looks like yes.

https://faq.i3wm.org/question/3279/do-i-need-a-composite-manager-compton.1.html

And of course I can just switch at the login screen to Qubes default xfce.

ghost commented 1 year ago

Is it possible to use a compositing manager with i3?

Most likely - both compton and picom (a compton fork if I understand correctly) are packaged in f32 so they could be installed in dom0 as any other package. I'm not a i3wm user and so far nobody has tested if installing compton/picom fixes tearing (although while researching my tearing issue I've read forum posts of non-Qubes OS i3wm mentioning compton fixed tearing issues they had).

Also: using UXA acceleration in intel - instead of the default SNA - seems to fix crashes/reboot for at least one person. I had that option enabled yesterday for the whole day and didn't encounter a crash - however my laptop then froze on suspend and I had to hard reset it. I'll update the intel gfx troubleshooting doc to reflect that.

DemiMarie commented 1 year ago

@BawdyAnarchist: if you switch to the modesetting driver (the default) instead of intel, does the problem go away? The intel driver is known to be buggy.

auroraanon38 commented 1 year ago

Is it possible to use a compositing manager with i3?

Most likely - both compton and picom (a compton fork if I understand correctly) are packaged in f32 so they could be installed in dom0 as any other package. I'm not a i3wm user and so far nobody has tested if installing compton/picom fixes tearing (although while researching my tearing issue I've read forum posts of non-Qubes OS i3wm mentioning compton fixed tearing issues they had).

Unfortunately I've tested it today. Whether I install & run picom in dom0 or in the qube playing video, tearing occurs anyway without using the Intel driver.

No idea if the crashes happen anyway as I didn't keep testing it for that long.

I also tried using picom --vsync in the qube and had an error about the driver not supporting it. Presumably that'd be the driver Qubes uses to enable graphics from qubes not having such a feature implemented. I didn't test that option in dom0 though.

edit: I can now also confirm that picom --vsync in dom0 has no effect on the tearing I see with the modeset driver. Running picom in both dom0 and the qube playing video also has no effect. Switching to the Intel driver does instantly fix the tearing.

DemiMarie commented 1 year ago

Modesetting got a tearfree option recently. Time to backport the relevant patch from upstream @marmarek?