NixOS / nixpkgs

Nix Packages collection & NixOS
MIT License
17.74k stars 13.86k forks source link

QEMU segfault when a test VM is interactively started #69158

Closed nlewo closed 4 years ago

nlewo commented 5 years ago

Describe the bug On the branch 19.09, when i start a test interactively, the QEMU process segfaults. It works well when executed non interactively.

To Reproduce

$ git checkout 4fd551ee2f1
$ nix-build nixos/tests/simple.nix -A driver
$ ./result/bin/nixos-run-vms 
starting VDE switch for network 1
running the VM test script
starting all VMs
machine: starting vm
machine: QEMU running (pid 18179)
(0.07 seconds)
waiting for all VMs to finish
machine: waiting for the VM to power off
(0.10 seconds)
(0.10 seconds)
(0.17 seconds)
collecting coverage data
(0.00 seconds)
syncing
(0.00 seconds)
test script finished in 0.17s
vde_switch: EOF on stdin, cleaning up and exiting
cleaning up
(0.00 seconds)

In the kernel logs, i can see:

[ 2110.889976] qemu-system-x86[18179]: segfault at 0 ip 00007fabf5618ae0 sp 00007ffca850dc68 error 6 in libc-2.27.so[7fabf54e8000+13d000]
[ 2110.889988] Code: fe 6f 06 c5 fe 6f 4e 20 c5 fe 6f 56 40 c5 fe 6f 5e 60 c5 fe 6f 64 16 e0 c5 fe 6f 6c 16 c0 c5 fe 6f 74 16 a0 c5 fe 6f 7c 16 80 <c5> fe 7f 07 c5 fe 7f 4f 20 c5 fe 7f 57 40 c5 fe 7f 5f 60 c5 fe 7f

Additional context The host kernel is 4.19.74 and the nixpkgs commit of the NixOS deployment is 51bc28fd29d689b6a1e8c663aa7113f9cb6a26ba (release 19.03).

It seems to be working fine for a friend of mine, so it may be related to the execution environment.

worldofpeace commented 5 years ago

I've been running VMs off unstable and 19.09 all week without issues. (host runs unstable) I believe it's because your host system is on 19.03 and the configuration you've built to run interactively is on 19.09. Though I wouldn't expect this to segfault, I've seen things like this happen before in a similar circumstance.

nlewo commented 5 years ago

I upgraded to my host to 19.09 and i still run into this issue:( @worldofpeace are you running VMs interactively?

worldofpeace commented 5 years ago

Yes I have @nlewo. Perhaps hardware information could benefit this report, it could be an issue in QEMU.

nlewo commented 5 years ago

So, I just tried to run the vm as root and it works well:/

flokli commented 5 years ago

@nlewo if you're on a default 19.09, there should be a coredump, available via coredumpctl debug. Could you post the backtrace here - it might be helpful.

nlewo commented 5 years ago

Thanks to the core dump, i have disable qemu graphics (virtualisation.graphics=false) in the test vm and the process is no longer segfaulting! So, the culprit seems to be gtk.

Here is the trace:

[nix-shell:~]$ coredumpctl debug
           PID: 24733 (qemu-system-x86)
           UID: 1000 (lewo)
           GID: 100 (users)
        Signal: 11 (SEGV)
     Timestamp: Fri 2019-10-11 17:21:27 CEST (48min ago)
  Command Line: /nix/store/48dgxdnfaj4yzp7m343q1zxxihksrnpw-qemu-host-cpu-only-for-vm-tests-4.0.0/bin/qemu-system-x86_64 -enable-kvm -cpu kvm64 -name machine -m 1024 -smp 1 -device virtio-rng-pci -net nic,netdev=user.0,model=virtio -netdev user,id=user.0 -virtfs local,path=/nix/store,security_model=none,mount_tag=store -virtfs local,path=/tmp/vm-state-machine/xchg,security_model=none,mount_tag=xchg -virtfs local,path=/tmp/xchg-shared,security_model=none,mount_tag=shared -drive index=0,id=drive1,file=/tmp/vm-state-machine/machine.qcow2,cache=writeback,werror=report,if=virtio -kernel /nix/store/4h603zbc7pg79bzrb8gm3wp4y4i416d7-nixos-system-machine-19.09.git.dbad7c7/kernel -initrd /nix/store/4h603zbc7pg79bzrb8gm3wp4y4i416d7-nixos-system-machine-19.09.git.dbad7c7/initrd -append console=ttyS0 panic=1 boot.panic_on_fail loglevel=7 net.ifnames=0 init=/nix/store/4h603zbc7pg79bzrb8gm3wp4y4i416d7-nixos-system-machine-19.09.git.dbad7c7/init regInfo=/nix/store/d4ck88244cd8v8kx602k95lvqdb4y3ki-closure-info/registration console=ttyS0  -device virtio-net-pci,netdev=vlan1,mac=52:54:00:12:01:01 -netdev vde,id=vlan1,sock=/mnt/data/home/lewo/repos/nixpkgs/vde1.ctl -vga std -usb -device usb-tablet,bus=usb-bus.0 -no-reboot -monitor unix:./monitor -chardev socket,id=shell,path=./shell -device virtio-serial -device virtconsole,chardev=shell -device virtio-rng-pci -serial stdio
    Executable: /nix/store/48dgxdnfaj4yzp7m343q1zxxihksrnpw-qemu-host-cpu-only-for-vm-tests-4.0.0/bin/qemu-system-x86_64
 Control Group: /user.slice/user-1000.slice/session-2.scope
          Unit: session-2.scope
         Slice: user-1000.slice
       Session: 2
     Owner UID: 1000 (lewo)
       Boot ID: c498fa9a280c452ba8c3d07838ff009d
    Machine ID: b0e3f361d1ab45b7a979e2bafb96a26b
      Hostname: tilia
       Storage: /var/lib/systemd/coredump/core.qemu-system-x86.1000.c498fa9a280c452ba8c3d07838ff009d.24733.1570807287000000000000.lz4
       Message: Process 24733 (qemu-system-x86) of user 1000 dumped core.

GNU gdb (GDB) 8.3
Copyright (C) 2019 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-unknown-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from /nix/store/48dgxdnfaj4yzp7m343q1zxxihksrnpw-qemu-host-cpu-only-for-vm-tests-4.0.0/bin/qemu-system-x86_64...
(No debugging symbols found in /nix/store/48dgxdnfaj4yzp7m343q1zxxihksrnpw-qemu-host-cpu-only-for-vm-tests-4.0.0/bin/qemu-system-x86_64)

warning: core file may not match specified executable file.
[New LWP 24733]
[New LWP 24761]
[New LWP 24763]
[New LWP 24762]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/nix/store/6yaj6n8l925xxfbcd65gzqx3dz7idrnn-glibc-2.27/lib/libthread_db.so.1".
Core was generated by `/nix/store/48dgxdnfaj4yzp7m343q1zxxihksrnpw-qemu-host-cpu-only-for-vm-tests-4.0'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x00007fa8d5a42ae0 in __memmove_avx_unaligned_erms () from /nix/store/kksyrix1bpklvgkmvngcv0q9nh8hn2fl-glibc-2.27/lib/libc.so.6
[Current thread is 1 (Thread 0x7fa8d3d6b480 (LWP 24733))]
(gdb) bt
#0  0x00007fa8d5a42ae0 in __memmove_avx_unaligned_erms () from /nix/store/kksyrix1bpklvgkmvngcv0q9nh8hn2fl-glibc-2.27/lib/libc.so.6
#1  0x00007fa8d6262715 in png_combine_row () from /nix/store/dg952z6ksmpx52c93260vrhr1c91wqk7-libpng-apng-1.6.37/lib/libpng16.so.16
#2  0x00007fa8d6252673 in png_push_process_row () from /nix/store/dg952z6ksmpx52c93260vrhr1c91wqk7-libpng-apng-1.6.37/lib/libpng16.so.16
#3  0x00007fa8d6252b24 in png_process_IDAT_data () from /nix/store/dg952z6ksmpx52c93260vrhr1c91wqk7-libpng-apng-1.6.37/lib/libpng16.so.16
#4  0x00007fa8d6252e0b in png_push_read_IDAT () from /nix/store/dg952z6ksmpx52c93260vrhr1c91wqk7-libpng-apng-1.6.37/lib/libpng16.so.16
#5  0x00007fa8d6252feb in png_process_data () from /nix/store/dg952z6ksmpx52c93260vrhr1c91wqk7-libpng-apng-1.6.37/lib/libpng16.so.16
#6  0x00007fa8d07ef0e9 in gdk_pixbuf.png_image_load_increment () from /nix/store/9qqd7vzwdqhyb54immag1w7ss3cf5cb2-gdk-pixbuf-2.36.7/lib/gdk-pixbuf-2.0/2.10.0/loaders/libpixbufloader-png.so
#7  0x00007fa8d678200d in gdk_pixbuf_loader_load_module () from /nix/store/w09ms3hfk8ki25n5cx1i7rdaj37dic1w-gdk-pixbuf-2.38.1/lib/libgdk_pixbuf-2.0.so.0
#8  0x00007fa8d6782895 in gdk_pixbuf_loader_close () from /nix/store/w09ms3hfk8ki25n5cx1i7rdaj37dic1w-gdk-pixbuf-2.38.1/lib/libgdk_pixbuf-2.0.so.0
#9  0x00007fa8d677f33b in load_from_stream () from /nix/store/w09ms3hfk8ki25n5cx1i7rdaj37dic1w-gdk-pixbuf-2.38.1/lib/libgdk_pixbuf-2.0.so.0
#10 0x00007fa8d678012c in gdk_pixbuf_new_from_stream () from /nix/store/w09ms3hfk8ki25n5cx1i7rdaj37dic1w-gdk-pixbuf-2.38.1/lib/libgdk_pixbuf-2.0.so.0
#11 0x00007fa8d6c0b45f in icon_info_ensure_scale_and_pixbuf () from /nix/store/frw1x2zl4p9kf0nwjbn41sjnh032z525-gtk+3-3.24.10/lib/libgtk-3.so.0
#12 0x00007fa8d6c0e6d8 in gtk_icon_info_load_icon () from /nix/store/frw1x2zl4p9kf0nwjbn41sjnh032z525-gtk+3-3.24.10/lib/libgtk-3.so.0
#13 0x00007fa8d6c0e934 in gtk_icon_theme_load_icon_for_scale () from /nix/store/frw1x2zl4p9kf0nwjbn41sjnh032z525-gtk+3-3.24.10/lib/libgtk-3.so.0
#14 0x00007fa8d6d95753 in icon_list_from_theme () from /nix/store/frw1x2zl4p9kf0nwjbn41sjnh032z525-gtk+3-3.24.10/lib/libgtk-3.so.0
#15 0x00007fa8d6d96de1 in gtk_window_realize_icon () from /nix/store/frw1x2zl4p9kf0nwjbn41sjnh032z525-gtk+3-3.24.10/lib/libgtk-3.so.0
#16 0x00007fa8d6d9df0f in gtk_window_realize () from /nix/store/frw1x2zl4p9kf0nwjbn41sjnh032z525-gtk+3-3.24.10/lib/libgtk-3.so.0
#17 0x00007fa8d76eed5d in g_closure_invoke () from /nix/store/nbn77rv8cgxnyhzn2qvrccpk9ga5pwrl-glib-2.60.7/lib/libgobject-2.0.so.0
#18 0x00007fa8d7701ccc in signal_emit_unlocked_R () from /nix/store/nbn77rv8cgxnyhzn2qvrccpk9ga5pwrl-glib-2.60.7/lib/libgobject-2.0.so.0
#19 0x00007fa8d770aa5e in g_signal_emit_valist () from /nix/store/nbn77rv8cgxnyhzn2qvrccpk9ga5pwrl-glib-2.60.7/lib/libgobject-2.0.so.0
#20 0x00007fa8d770b11f in g_signal_emit () from /nix/store/nbn77rv8cgxnyhzn2qvrccpk9ga5pwrl-glib-2.60.7/lib/libgobject-2.0.so.0
#21 0x00007fa8d6d8eef6 in gtk_widget_realize () from /nix/store/frw1x2zl4p9kf0nwjbn41sjnh032z525-gtk+3-3.24.10/lib/libgtk-3.so.0
#22 0x00007fa8d6d9c42d in gtk_window_show () from /nix/store/frw1x2zl4p9kf0nwjbn41sjnh032z525-gtk+3-3.24.10/lib/libgtk-3.so.0
#23 0x00007fa8d76eed5d in g_closure_invoke () from /nix/store/nbn77rv8cgxnyhzn2qvrccpk9ga5pwrl-glib-2.60.7/lib/libgobject-2.0.so.0
#24 0x00007fa8d7701ccc in signal_emit_unlocked_R () from /nix/store/nbn77rv8cgxnyhzn2qvrccpk9ga5pwrl-glib-2.60.7/lib/libgobject-2.0.so.0
#25 0x00007fa8d770aa5e in g_signal_emit_valist () from /nix/store/nbn77rv8cgxnyhzn2qvrccpk9ga5pwrl-glib-2.60.7/lib/libgobject-2.0.so.0
#26 0x00007fa8d770b11f in g_signal_emit () from /nix/store/nbn77rv8cgxnyhzn2qvrccpk9ga5pwrl-glib-2.60.7/lib/libgobject-2.0.so.0
#27 0x00007fa8d6d88f76 in gtk_widget_show () from /nix/store/frw1x2zl4p9kf0nwjbn41sjnh032z525-gtk+3-3.24.10/lib/libgtk-3.so.0
#28 0x000055da27fa3f5e in ?? ()
#29 0x000055da27c54c8c in main ()
(gdb) quit
flokli commented 5 years ago

a segfault in __memmove_avx_unaligned_erms sounds scary - what CPU is this running on?

nlewo commented 5 years ago

Intel(R) Core(TM) i7-5600U CPU @ 2.60GHz

nlewo commented 5 years ago

Ok, it works well when I unset GDK_PIXBUF_MODULE_FILE. It seems to be the same kind of issue than https://github.com/NixOS/nixpkgs/issues/54278.

flokli commented 5 years ago

cc @jtojnar @samueldr

worldofpeace commented 5 years ago

@flokli I guess it needs a wrapper. Pretty sure you don't run into https://github.com/NixOS/nixpkgs/issues/54278 ever if the executable is wrapped.

stale[bot] commented 4 years ago

Thank you for your contributions. This has been automatically marked as stale because it has had no activity for 180 days. If this is still important to you, we ask that you leave a comment below. Your comment can be as simple as "still important to me". This lets people see that at least one person still cares about this. Someone will have to do this at most twice a year if there is no other activity. Here are suggestions that might help resolve this more quickly:

  1. Search for maintainers and people that previously touched the related code and @ mention them in a comment.
  2. Ask on the NixOS Discourse. 3. Ask on the #nixos channel on irc.freenode.net.
flokli commented 4 years ago

It seems nixos-run-vms simply invokes qemu-kvm, and wrapping gtk binaries is as simple as https://github.com/NixOS/nixpkgs/pull/89328, so I opened this PR. @nlewo, I'd appreciate if you could test this, as I wasn't able to reproduce this issue locally.

nlewo commented 4 years ago

I'm unfortunately no longer able to reproduce:(