Open flokli opened 1 year ago
Instead of virtualisation.rosetta.enable = true;
, I tried boot.binfmt.emulatedSystems = [ "x86_64-linux" ];
.
I could get saleae-logic
to run, but the others (mostly Electron apps) still segfault.
Chrome itself seems to also be very angry:
❯ google-chrome-stable --no-sandbox
[0105/230925.555828:WARNING:crashpad_client_linux.cc(362)] prctl: Invalid argument (22)
[13183:13183:0105/230926.830157:ERROR:nacl_fork_delegate_linux.cc(313)] Bad NaCl helper startup ack (0 bytes)
/nix/store/r17ihqafckhr6ykz4xjr1wz4nhi338ya-gvfs-1.50.2/lib/gio/modules/libgvfsdbus.so: cannot open shared object file: No such file or directory
Failed to load module: /nix/store/r17ihqafckhr6ykz4xjr1wz4nhi338ya-gvfs-1.50.2/lib/gio/modules/libgvfsdbus.so
(google-chrome:13148): Gtk-WARNING **: 23:09:29.610: Could not load a pixbuf from icon theme.
This may indicate that pixbuf loaders or the mime database could not be found.
[13148:13148:0105/230931.202880:ERROR:gpu_process_host.cc(984)] GPU process launch failed: error_code=1002
[13148:13148:0105/230931.431947:ERROR:gpu_process_host.cc(984)] GPU process launch failed: error_code=1002
[13148:13148:0105/230931.535923:ERROR:gpu_process_host.cc(984)] GPU process launch failed: error_code=1002
[13148:13148:0105/230931.592585:ERROR:gpu_process_host.cc(984)] GPU process launch failed: error_code=1002
[13148:13148:0105/230931.643566:ERROR:gpu_process_host.cc(984)] GPU process launch failed: error_code=1002
[13148:13148:0105/230931.666660:ERROR:gpu_process_host.cc(984)] GPU process launch failed: error_code=1002
[13148:13148:0105/230931.682175:ERROR:gpu_process_host.cc(984)] GPU process launch failed: error_code=1002
[13148:13148:0105/230931.902809:ERROR:gpu_process_host.cc(984)] GPU process launch failed: error_code=1002
[13148:13148:0105/230931.949431:ERROR:gpu_process_host.cc(984)] GPU process launch failed: error_code=1002
[13148:13148:0105/230931.949532:FATAL:gpu_data_manager_impl_private.cc(440)] GPU process isn't usable. Goodbye.
**
ERROR:../accel/tcg/cpu-exec.c:954:cpu_exec: assertion failed: (cpu == current_cpu)
Bail out! ERROR:../accel/tcg/cpu-exec.c:954:cpu_exec: assertion failed: (cpu == current_cpu)
[13239:13245:0105/230938.368949:ERROR:ssl_client_socket_impl.cc(982)] handshake failed; returned -1, SSL error code 1, net_error -3
[1] 13148 trace trap (core dumped) google-chrome-stable --no-sandbox
[13239:13245:0105/230938.374599:ERROR:ssl_client_socket_impl.cc(982)] handshake failed; returned -1, SSL error code 1, net_error -3
[13239:13245:0105/230938.376284:ERROR:ssl_client_socket_impl.cc(982)] handshake failed; returned -1, SSL error code 1, net_error -3
[13239:13245:0105/230938.376514:ERROR:ssl_client_socket_impl.cc(982)] handshake failed; returned -1, SSL error code 1, net_error -3
[13239:13245:0105/230938.376841:ERROR:ssl_client_socket_impl.cc(982)] handshake failed; returned -1, SSL error code 1, net_error -3
[13239:13245:0105/230938.377012:ERROR:ssl_client_socket_impl.cc(982)] handshake failed; returned -1, SSL error code 1, net_error -3
Okay, that crash seems to be a qemu bug: https://gitlab.com/qemu-project/qemu/-/issues/1147
@flokli I found this thread by googling '0x0000800000022800' 😄
I'm getting a very similar stack trace when doing this:
$ nix shell github:oxalica/rust-overlay#packages.x86_64-linux.rust
$ cargo --version
Segmentation fault (core dumped)
$ gdb cargo
(gdb) r
Starting program: /nix/store/qz8gvkxcyiidg4rrrlgif65ca9r8xka9-rust-default-1.67.0/bin/cargo
warning: Selected architecture i386:x86-64 is not compatible with reported target architecture aarch64
warning: Architecture rejected target-supplied description
Program received signal SIGSEGV, Segmentation fault.
0x0000800000022800 in ?? ()
(gdb) b
Breakpoint 1 at 0x800000022800
(gdb) bt
#0 0x0000800000022800 in ?? ()
#1 0x00008000000766bc in ?? ()
#2 0x0000ffffffffd440 in ?? ()
#3 0x3000702d2d720030 in ?? ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
Weirdly this doesn't happen when I do nix run nixpkgs#legacyPackages.x86_64-linux.cargo -- --version
also if I run the program using valgrind
using nix shell nixpkgs#legacyPackages.x86_64-linux.valgrind
and then valgrind -v cargo
, it runs just fine...
I'm also using the rosetta nixos module.
My hypothesis is some sort of impurity that leads to an incorrect binary...
Discovered something interesting:
$ nix build nixpkgs#legacyPackages.x86_64-linux.rust.packages.prebuilt.cargo
$ /run/rosetta/rosetta $(patchelf --print-interpreter result/bin/.cargo-wrapped) result/bin/.cargo-wrapped --version
cargo 1.65.0 (4bc8f24d3 2022-10-20)
$ /run/rosetta/rosetta result/bin/.cargo-wrapped --version
Segmentation fault (core dumped)
It seems rosetta can't handle the interpreter being patched for dynamic libraries. Perhaps it doesn't use the PT_INTERP
at all?
We could work around this by changing the binfmt. @flokli can you try the above commands for your programs and see if that resolves things?
@bouk what exactly should i try? I don't have a differently linked signal-desktop binary...
Try running this:
nix shell nixpkgs#patchelf # Or try installing patchelf into your systemPackages
$(patchelf --print-interpreter $(which spotify)) spotify
Ah, you mean manually invoking the interpreter from the interpreter field... Interesting, I'll try and report back.
Doing some stracing
reveals more information:
strace ./cargo2
execve("./cargo2", ["./cargo2"], 0xffffec8f0eb0 /* 45 vars */) = 0
openat(AT_FDCWD, "/proc/self/exe", O_RDONLY) = 4
ioctl(4, _IOC(_IOC_READ, 0x61, 0x22, 0x45), 0xffffe95ee350) = 1
close(4) = 0
gettid() = 7323
getpid() = 7323
openat(AT_FDCWD, "/proc/self/maps", O_RDONLY) = 4
pread64(4, "800000000000-800000022000 r--p 0"..., 4170, 0) = 523
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xffff988b4000
pread64(4, "", 4170, 523) = 0
close(4) = 0
openat(AT_FDCWD, "/proc/sys/vm/mmap_min_addr", O_RDONLY) = 4
read(4, "4096\n", 1023) = 5
close(4) = 0
readlinkat(AT_FDCWD, "/proc/self/fd/3", "/home/nix/cargo2", 4095) = 16
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\0\0C\0\0\0\0\0"..., 64) = 64
mmap(NULL, 792, PROT_READ, MAP_PRIVATE, 3, 0) = 0xffff988b3000
--- SIGSEGV {si_signo=SIGSEGV, si_code=SEGV_MAPERR, si_addr=0xffff9a13b000} ---
+++ killed by SIGSEGV (core dumped) +++
Segmentation fault (core dumped)
Only the first 792 bytes of the binary are mmap
ed, while the interp section is moved to the end of the file (running patchelf --debug
)
patching ELF file 'cargo2'
replacing section '.interp' with size 28
this is a dynamic library
last page is 0xf85000
first page is 0x0
needed space is 6472
shifting new PT_LOAD segment by 9449472 bytes to work around a Linux kernel bug
rewriting section '.interp' from offset 0x2e0 (size 28) to offset 0x1888000 (size 28)
rewriting section '.note.ABI-tag' from offset 0x2fc (size 32) to offset 0x1888020 (size 32)
rewriting section '.dynsym' from offset 0x320 (size 6408) to offset 0x1888040 (size 6408)
rewriting symbol table section 36
rewriting symbol table section 41
writing cargo2
So it seems that rosetta tries to read .interp
and fails because it hasn't memory mapped that section. Notice that 0xffff9a13b000 - 0xffff988b3000 = 0x1888000
. This gives us something to work with! I can file a bug with Apple.
I've submitted the following bug report to Apple under FB11984253:
Hello, I'm trying out Rosetta for Linux in NixOS using UTM.app. I'm running into a segmentation fault inside Rosetta when trying to execute a binary that has an .interp section that's not close to the beginning of the binary. To reproduce the exact binary I'm using, please do the following (I've also attached a copy):
- Download and unpack https://static.rust-lang.org/dist/rust-1.66.0-x86_64-unknown-linux-gnu.tar.gz
- cp rust-1.66.0-x86_64-unknown-linux-gnu/cargo/bin/cargo cargo2
- Execute https://github.com/NixOS/patchelf (I'm using version 0.17.2) as follows: patchelf --debug --set-interpreter /lib64/ld-linux-x86-64.so.2 cargo2
- rosetta ./cargo2
Here's what I get when I run strace -i ./cargo2 (note the instruction address is in the rosetta program space):
strace -i ./cargo2 argo [0000ffff93ff504c] execve("./cargo2", ["./cargo2"], 0xffffc8c8c658 /* 45 vars */) = 0 [000080000002306c] openat(AT_FDCWD, "/proc/self/exe", O_RDONLY) = 4 [0000800000022e04] ioctl(4, _IOC(_IOC_READ, 0x61, 0x22, 0x45), 0xfffff6244340) = 1 [0000800000022a80] close(4) = 0 [0000800000022d6c] gettid() = 8473 [0000800000023580] getpid() = 8473 [000080000002306c] openat(AT_FDCWD, "/proc/self/maps", O_RDONLY) = 4 [00008000000230f0] pread64(4, "800000000000-800000022000 r--p 0"..., 4170, 0) = 523 [0000800000022f64] mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xffff82348000 [00008000000230f0] pread64(4, "", 4170, 523) = 0 [0000800000022a94] close(4) = 0 [000080000002306c] openat(AT_FDCWD, "/proc/sys/vm/mmap_min_addr", O_RDONLY) = 4 [00008000000231cc] read(4, "4096\n", 1023) = 5 [0000800000022a94] close(4) = 0 [00008000000231f8] readlinkat(AT_FDCWD, "/proc/self/fd/3", "/home/nix/cargo2", 4095) = 16 [00008000000231cc] read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\0\0C\0\0\0\0\0"..., 64) = 64 [0000800000022f64] mmap(NULL, 792, PROT_READ, MAP_PRIVATE, 3, 0) = 0xffff82347000 [0000800000022878] --- SIGSEGV {si_signo=SIGSEGV, si_code=SEGV_MAPERR, si_addr=0xffff83bcf000} --- [????????????????] +++ killed by SIGSEGV (core dumped) +++ Segmentation fault (core dumped)
As you can see it segfaults because it tries to access a value 0x1888000 bytes into the binary while only 792 bytes have been mmapped. This makes sense when you look at the debug log of patchelf:
patching ELF file 'cargo2' replacing section '.interp' with size 28 this is a dynamic library last page is 0xf85000 first page is 0x0 needed space is 6472 shifting new PT_LOAD segment by 9449472 bytes to work around a Linux kernel bug rewriting section '.interp' from offset 0x2e0 (size 28) to offset 0x1888000 (size 28) rewriting section '.note.ABI-tag' from offset 0x2fc (size 32) to offset 0x1888020 (size 32) rewriting section '.dynsym' from offset 0x320 (size 6408) to offset 0x1888040 (size 6408) rewriting symbol table section 36 rewriting symbol table section 41 writing cargo2
Running readelf -e cargo2 also provides useful information about the structure of the binary. I've attached its output as cargo2.elf.txt.
This binary was produced using https://github.com/NixOS/patchelf which is a tool that NixOS uses to modify dynamically linked binaries. It moves the .interp section to the back of the binary to safely modify the sections.
Using UTM Version 4.1.5 (74)
Output of /run/rosetta/rosetta:
Usage: rosetta <x86_64 ELF to run> Optional environment variables: ROSETTA_DEBUGSERVER_PORT wait for a debugger connection on given port version: Rosetta-289.7
uname -a Linux nixos-builder 5.15.89 #1-NixOS SMP Wed Jan 18 10:48:59 UTC 2023 aarch64 GNU/Linux
Some discussion is also at the following GitHub issue: https://github.com/NixOS/nixpkgs/issues/209242
This issue has been mentioned on NixOS Discourse. There might be relevant details there:
https://discourse.nixos.org/t/running-nixos-on-macos-with-rosetta-segfaults/25351/1
@bouk - any progress with FB11984253 on Apple side?
Nope, haven't heard anything from Apple.
I gave it a try and made https://github.com/zhaofengli/rosetta-spice to patch Rosetta to fix the problem, and there is a NixOS module that will configure everything. It hooks sys_mmap to map enough of the binary until PT_INTERP. Hopefully this will all become obsolete soon - I want things to work now so I got my hands dirty 😛
As a bonus, it also allows you to use AOT without needing the host to configure it. This requires either macOS Sonoma or setting virtualisation.rosetta-spice.rosettaPkg
to packages.aarch64-linux.rosetta
from the flake. However, AOT appears to be buggy at the moment and complex programs either segfault when running or OOM during translation.
WIth AOT enabled:
p7zip
: Runsgeekbench_5
: Runsspotify
: AOT header specified too many segments
saleae-logic
: Segfaultsgeekbench_6
: Segfaultschromium
: rosettad
OOMs during translationLooks like the segfault no longer occurs on Sonoma Beta 5 (23A5312d)! If you don't want to upgrade to the beta or want to try AOT, you can use rosetta-spice to get the version (the segfault fix no longer has an effect).
Can we confirm that this issue is indeed fixed in the released version of Sonoma, and close this issue?
I just setup a VM running on UTM with rosetta and after installing ida-free it just works via X11 forwarding. Not sure how that affects it but seems to work just fine
I set up a aarch64-linux graphical NixOS system (nixpkgs master) inside UTM.
Rosetta is enabled, and I can successfully run a x86_64-linux
xclock
.Most of the system is already aarch64-linux, but some applications are available for x86_64-linux only (Electron apps mostly).
I created a "forced x86_64-linux overlay" in my
overlay.nix
:… and then referred to all x86_64 only applications via
pkgsx86_64.$packageName
.Unfortunately, all these applications segfault :-/
gdb
isn't very helpful obviously:I'm somewhat suspecting some weird cross-arch graphics driver interactions, but am a bit lost. Anyone got some ideas?
cc @toonn @alyssais @sandydoo