QubesOS / qubes-issues

The Qubes OS Project issue tracker
https://www.qubes-os.org/doc/issue-tracking/
541 stars 48 forks source link

AppVM GUI crash: U2MFN_GET_MFN_FOR_PAGE: get_user_pages failed #2617

Closed pdinoto closed 5 years ago

pdinoto commented 7 years ago

Qubes OS version (e.g., R3.2):

R3.2

Affected TemplateVMs (e.g., fedora-23, if applicable):

debian-9


Expected behavior:

Graphic applications work as usual.

Actual behavior:

AppVMs based on this template work fine until a concrete graphic action triggers a crash of the GUI component in the AppVM, closing all windows.

Steps to reproduce the behavior:

Create a new AppVM based on a debian-9 template (template specs follow)

General notes:

On the console, at the time of the crash lots of

U2MFN_GET_MFN_FOR_PAGE: get_user_pages failed, ret=0xfffffffffffffff2

are shown on the console.

Attempting to open new windows, like running gnome-terminal result in a brief display of the new window, all the others that were opened, and then all dissapear at the same time.

Applications are running, I can shutdown the AppVM from the console.

So far, graphical actions that seem to trigger the crash are:


Related issues:

Maybe this comment is related?

pdinoto commented 7 years ago

The debian-9 template was created by dist-upgrade-ing a plain and working debian-8 template.

Qubes repositories enabled are:

# Main qubes updates repository
deb [arch=amd64] http://deb.qubes-os.org/r3.2/vm stretch main
#deb-src http://deb.qubes-os.org/r3.2/vm stretch main

# Qubes updates candidates repository
deb [arch=amd64] http://deb.qubes-os.org/r3.2/vm stretch-testing main
#deb-src http://deb.qubes-os.org/r3.2/vm stretch-testing main

# Qubes security updates testing repository
deb [arch=amd64] http://deb.qubes-os.org/r3.2/vm stretch-securitytesting main
#deb-src http://deb.qubes-os.org/r3.2/vm stretch-securitytesting main
adrelanos commented 7 years ago

sys-whonix (stretch based) randomly crashing when using konsole (with rather regular, non-fancy output).

emdete commented 7 years ago

i see this with my debian-9 based VMs as well, mostly on startup of programs. it happens for example on startup of rxvt and gvim. it seems to be depending on how many/what other programs are already running in other VMs. it is currently my showstopper to use qubes. #2455 mentions logfiles but i dont see anything other than the given message. the VM does not crash and can still be accessed via virsh.

marmarek commented 7 years ago

@HW42 any idea?

marmarek commented 7 years ago

Copying my comment from #2455:

0xfffffffffffffff2 is EFAULT returned from get_user_pages call, which suggests that the window composition buffer is no longer in memory, or maybe even getting its address failed. Check logs from gui-agent (should be in journalctl inside of VM) and X server logs (~/.local/share/xorg/Xorg.0.log). If nothing specific there, try enabling debug mode in the VM settings.

pdinoto commented 7 years ago

My actual setup can be crashed this way quite predictably. Will try to catch logs.

Maybe it does not provide any insightful info, but once the VM has its gui crashed, I tried logging out of dom0 and logging back (which in my view would provide a new and clean X.org session), and you can see all VM windows reappear, and then all those from the crashed VM gets closed.

jpouellet commented 7 years ago

I tried logging out of dom0 and logging back, and you can see all VM windows reappear, and then all those from the crashed VM gets closed.

FWIW, this is the behavior I always observe for all VMs with logging out/in of dom0, regardless of crashed state.

pdinoto commented 7 years ago

I tried logging out of dom0 and logging back, and you can see all VM windows reappear, and then all those from the crashed VM gets closed.

FWIW, this is the behavior I always observe for all VMs with logging out/in of dom0, regardless of crashed state.

Weird: I am used to fix pulseaudio issues in dom0 (weird state after docking my notebook) by loogging out/in, without losing any work being done on the VMs, as all windows appear back once log in; I just lose their screen position as all come back in the same XFCE workspace.

This appear/dissappear only happens on these crashed VMs.

HW42 commented 7 years ago

I already tried to reproduce this a few days ago.

I tried it today again. But even if I carefully try to replicate the circumstances @emdete described on IRC I'm not able to reproduce it.

My blind guess is

a) The provided address is invalid and does not affects a normal X (for example something might try to map a NULL pointer for a short moment). b) It's a special address which u2mfn can't map to a page. For example something allocated via an device file.

Given the unreproducibility b) is not very likely. And of course it's quite likely something completely different.

HW42 commented 7 years ago

FWIW: I also tried other other cases like what @jpouellet describes, unfortunately without success.

joyfulmantis commented 7 years ago

I am very reliably being able to reproduce this issue. The prime culprits are emacs (25) and gnome-terminal, although I vaguely remember other applications causing the crash too. emacs and gnome-terminal are both applications that rely on a different size (and resizing) system than other applications, noticeable in that if you slowly drag the window open there will be a little on top of the window bubble that tells you it's width and height in lines, they also can't usually resize by pixels, so occasionally even if you slowly drag the window across your full screen you may find that while it fills up most of the screen, there is a little bit that is smaller than a line's worth of pixels, and the window will not grow to fill that space. This is however not the case when they get maximized by clicking the maximization button or by for dragging it to the top of the screen on standard linux desktops.

The prime way for me to reproduce it is when one of those windows are open, be it either emacs or gnome terminal, and I try to resize it either on accident or on purpose by pulling it against the top bar (maximizing it) or against one of the sides (resizing it into either half the screen or a quarter of the screen). The gui almost always crashes for me under this scenario.

pdinoto commented 7 years ago

Well, after experiencing this issue consistently but unable to capture any significant log, there are two things that may provide some pointers:

jpouellet commented 7 years ago

Could not copy the content the logs in that case, I am afraid.

Perhaps you already know this and it is not the issue, but if the data you want is indeed in the logs, just you cannot retrieve it because the GUI has crashed and the logs do not persist across VM reboots, note that you can still get a console in the VM with:

[user@dom0 ~]$ sudo xl console your-vm-name

and log in as root with no password, and use qvm-copy-to-vm or similar to extract the relevant log files.

HW42 commented 7 years ago

I finally found a reliable way to reproduce it :]

Will debug this further later today.

unman commented 7 years ago

On Wed, Mar 22, 2017 at 12:43:13AM -0700, HW42 wrote:

I finally found a reliable way to reproduce it :]

Will debug this further later today.

Don't be coy - how do you reproduce it?

HW42 commented 7 years ago

Don't be coy - how do you reproduce it?

I'm sorry didn't had the time yet to write it up and wanted to avoid duplicated work.

How I can reproduce it:

One time preparation:

Try to trigger trash:

This triggers the crash for me most of the time in the first or second try ("worst" case 9 tries so far).

Given that the bug is time critical (see below) I would not be surprised if this does not work for you.

It seems that when opening the pdf there are two configure events. One with the old size and very shortly after it one with the new window size. When processing the first configure event the pointer for the window memory sometimes points to no longer mapped memory. Therefore mlock/u2mfn returns an error. I do not know yet what's the cause and if it's a bug in qubes-gui-agent, Xorg, or gtk.

HW42 commented 7 years ago

This seems to be a bug in our code. Newer Xsevers have a separate thread for input processing. So when we access the window object to get the memory pages with the image data we have a race condition with the main thread if the client changes the Pixmap.

https://github.com/QubesOS/qubes-gui-agent-linux/pull/12/commits/5ea68d2b5b3347b3f22b68722a2f56b4b9436e78 should fix this.

HW42 commented 7 years ago

Marek just uploaded a new version of the gui-agent (version 3.2.15, xserver-xorg-input-qubes in Debian and qubes-gui-vm in Fedora) which includes my patch to the testing repositories.

Please test if, a) this actually resolves this issue for you, b) don't cause any other new problems.

Thanks.

pdinoto commented 7 years ago

Thanks, @jpouellet. In that case, I was unable to transfers the logs because the issue makes the U2MFN_GET_MFN_FOR_PAGE error appear several times per second on Xorg.0 log, which if you are not fast enough makes the VM unresposive as /tmp fills up quickly.

Great, @HW42! I will check for the update and test it.

andrewdavidwong commented 7 years ago

Marek just uploaded a new version of the gui-agent (version 3.2.15, xserver-xorg-input-qubes in Debian and qubes-gui-vm in Fedora) which includes my patch to the testing repositories.

Please test if, a) this actually resolves this issue for you, b) don't cause any other new problems.

Thanks.

Possible new problem: https://github.com/QubesOS/updates-status/issues/18#issuecomment-289214634

marmarek commented 7 years ago

Updated package: https://github.com/QubesOS/updates-status/issues/20

andrewdavidwong commented 5 years ago

This issue is being closed because:

If anyone believes that this issue should be reopened, please let us know in a comment here.