docker-library / busybox

Docker Official Image packaging for Busybox
http://busybox.net
384 stars 127 forks source link

Segfault on `riscv64` #203

Open yosifkit opened 2 weeks ago

yosifkit commented 2 weeks ago

As discovered in https://github.com/docker-library/busybox/pull/202, busybox segfaults when running on real riscv64 hardware but works fine on QEMU 😭. Just opening this as a tracking issue.

+ gdb -core=rootfs/core -silent -ex bt full -ex quit busybox_unstripped
Reading symbols from busybox_unstripped...

warning: core file may not match specified executable file.
[New LWP 11]
Core was generated by `nslookup google.com'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  __GI_memset (dstpp=dstpp@entry=0x3ffc0d1530, c=c@entry=0, 
    len=<optimized out>, len@entry=16777216)
    at libc/string/generic/memset.c:50
50        ((op_t *) dstp)[0] = cccc;
#0  __GI_memset (dstpp=dstpp@entry=0x3ffc0d1530, c=c@entry=0, 
    len=<optimized out>, len@entry=16777216)
    at libc/string/generic/memset.c:50
        xlen = <optimized out>
        cccc = 0
        dstp = 274811655472
#1  0x000000000010c220 in __poll_nocancel (fds=fds@entry=0x3ffd0d1628, 
    nfds=nfds@entry=1, timeout=<optimized out>)
    at libc/sysdeps/linux/common/poll.c:70
        max_fd_size = 1073741816
        tv = {tv_sec = 16, tv_usec = 2}
        rset = 0x3ffc0d1530
        wset = 0x3ffb0d1530
        xset = <optimized out>
        f = <optimized out>
        ready = <optimized out>
        error_num = <optimized out>
        maxfd = 0
        bytes = 16777216
#2  0x000000000010c740 in __GI_poll (fds=fds@entry=0x3ffd0d1628, 
    nfds=nfds@entry=1, timeout=timeout@entry=2500)
    at libc/sysdeps/linux/common/poll.c:215
        oldtype = <optimized out>
        result = <optimized out>
#3  0x0000000000047d68 in send_queries (ns=0x19e80e70)
    at networking/nslookup.c:569
        qn = <optimized out>
        recvlen = <optimized out>
        reply = "X\004\023\000\000\000\000\000h\004\023\000\000\000\000\000X\004\023\000\000\000\000\000\374\375\347\031", '\000' <repeats 12 times>, "\001", '\000' <repeats 15 times>, "\001", '\000' <repeats 16 times>, "\027\r\375?\000\000\000`r\027\000\000\000\000\000\024\235\022", '\000' <repeats 13 times>, "t\004\023\000\000\000\000\000X\004\023\000\000\000\000\000h\004\023\000\000\000\000\000X\004\023\000\000\000\000\000\374\375\347\031", '\000' <repeats 28 times>, "\001", '\000' <repeats 16 times>, "\027\r\375?\000\000\000`r\027\000\000\000\000\000"...
        rcode = <optimized out>
        local_lsa = 0x19e80850
        pfd = {fd = 3, events = 1, revents = 0}
        servfail_retry = 4
        n_replies = 0
        retry_interval = <optimized out>
        timeout = 5000
        tstart = 97021078
        tsent = 97021078
        tcur = 97021078
#4  0x00000000000484fc in nslookup_main (argc=<optimized out>, 
    argv=<optimized out>, argv@entry=0x3ffd0d1c98)
    at networking/nslookup.c:984
        c = <optimized out>
        types = 0
        rc = 0
        err = <optimized out>
#5  0x0000000000010b58 in run_applet_no_and_exit (applet_no=<optimized out>, 
    name=name@entry=0x3ffd0d1ed1 "nslookup", argv=argv@entry=0x3ffd0d1c98)
    at libbb/appletlib.c:969
        argc = <optimized out>
#6  0x0000000000010f28 in run_applet_and_exit (name=0x3ffd0d1ed1 "nslookup", 
    argv=argv@entry=0x3ffd0d1c98) at libbb/appletlib.c:988
        applet = <optimized out>
#7  0x0000000000010fc0 in main (argc=<optimized out>, argv=0x3ffd0d1c98)
    at libbb/appletlib.c:1128
No locals.

Originally posted by @tianon in https://github.com/docker-library/busybox/issues/202#issuecomment-2163880467

tianon commented 2 weeks ago

Options:

tianon commented 6 days ago

Good news! It segfaults in system emulation too!!

tianon commented 6 days ago

Hmm, package/uclibc (which is where our uClibc version comes from: https://github.com/buildroot/buildroot/blob/2024.02.3/package/uclibc/uclibc.hash) didn't change between 2024.02.2 and 2024.02.3, so while the segfault clearly exhibits somewhere in uClibc, it's probably not caused by uClibc.

tianon commented 6 days ago

This seems like a more likely candidate: (https://github.com/buildroot/buildroot/commit/5d9c54de0c48be3a616db5e3c2f6c7112ff635bf)

diff --git a/package/gcc/gcc.hash b/package/gcc/gcc.hash
index 5061a603bc..964fbc97df 100644
--- a/package/gcc/gcc.hash
+++ b/package/gcc/gcc.hash
@@ -6,8 +6,8 @@ sha512  440c08ca746da450d9a1b35e8fd2305cb27e7e6987cd9d0f7d375f3b1fc9e4b0bd7acb3c
 sha512  a5018bf1f1fa25ddf33f46e720675d261987763db48e7a5fdf4c26d3150a8abcb82fdc413402df1c32f2e6b057d9bae6bdfa026defc4030e10144a8532e60f14  gcc-11.4.0.tar.xz
 # From https://gcc.gnu.org/pub/gcc/releases/gcc-12.3.0/sha512.sum
 sha512  8fb799dfa2e5de5284edf8f821e3d40c2781e4c570f5adfdb1ca0671fcae3fb7f794ea783e80f01ec7bfbf912ca508e478bd749b2755c2c14e4055648146c204  gcc-12.3.0.tar.xz
-# From https://gcc.gnu.org/pub/gcc/releases/gcc-13.2.0/sha512.sum
-sha512  d99e4826a70db04504467e349e9fbaedaa5870766cda7c5cab50cdebedc4be755ebca5b789e1232a34a20be1a0b60097de9280efe47bdb71c73251e30b0862a2  gcc-13.2.0.tar.xz
+# From https://gcc.gnu.org/pub/gcc/releases/gcc-13.3.0/sha512.sum
+sha512  ed5f2f4c6ed2c796fcf2c93707159e9dbd3ddb1ba063d549804dd68cdabbb6d550985ae1c8465ae9a336cfe29274a6eb0f42e21924360574ebd8e5d5c7c9a801  gcc-13.3.0.tar.xz

 # Locally calculated (fetched from Github)
 sha512  4dca20f517a42bb027fec605965b09fb917a535eebf3fe3e811d93476b02b1962df5ad4665f117bd44c2ec8e8015d51a44c00591761fe5f259c201ac5c7d920f  gcc-arc-2023.09-release.tar.gz
tianon commented 5 days ago

Oh, that was a dead end -- the default is gcc-12, not gcc-13, so we get 12.3.0 with or without that patch. :facepalm:

tianon commented 5 days ago

In the pursuit of further narrowing things down, the upgrade from kernel headers 6.6.22 to 6.6.32 is also not the culprit.

tianon commented 4 days ago

https://github.com/buildroot/buildroot/commit/a27009724737e381414d235b7aba3f43cb1f7dd1 is another dead-end (it was a long shot, but I'm running out of promising things in git diff 2024.02.2..2024.02.3 :sob:)

tianon commented 3 days ago

Well, the reason I felt like buildroot was gaslighting me was because I was gaslighting myself. I can reproduce the segfault on 2024.02.2 as well. :sob:

tianon commented 3 days ago

Confirmed, the current published busybox:uclibc image segfaults too. :face_exhaling:

Edit: * on native riscv64 hardware

tianon commented 3 days ago

Also segfaulting (thanks to repo-info):

tianon commented 3 days ago

In better news, busybox:glibc and busybox:musl are both fine, so we could just disable our riscv64 builds of uclibc and call it good, but it should be supported. :sob:

tianon commented 3 days ago

I should also clarify that this isn't just nslookup -- running the interactive shell segfaults reliably for me as well.