docker-library / busybox

Docker Official Image packaging for Busybox
http://busybox.net
388 stars 126 forks source link

Add missing `set -e` to our `nslookup` smoke test 😭 #202

Closed tianon closed 2 months ago

tianon commented 2 months ago

Apparently missing for ~5 years 🤦

This might cause some of our builds to fail which haven't been previously, but I guess they would do so in ways we should investigate. :see_no_evil:

tianon commented 2 months ago

Discovered via nslookup failing on riscv64, but not catching it until the line in build.sh runs nslookup again :facepalm:

tianon commented 2 months ago

Lots of hacking and very very slow riscv64 rebuilding later, and I've got a backtrace for the segfault:

+ gdb -core=rootfs/core -silent busybox_unstripped
Reading symbols from busybox_unstripped...

warning: core file may not match specified executable file.
[New LWP 11]
Core was generated by `nslookup google.com'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  __GI_memset (dstpp=dstpp@entry=0x3ff3c55530, c=c@entry=0, 
    len=<optimized out>, len@entry=16777216)
    at libc/string/generic/memset.c:50
50        ((op_t *) dstp)[0] = cccc;

(interestingly, it only segfaults on real hardware and works fine on QEMU :sob:)

tianon commented 2 months ago

https://github.com/wbx-github/uclibc-ng/blob/318858b4735dc38720be492cc30971ca1a1d55f8/libc/string/generic/memset.c#L50 nothing obvious here!

(this whole block of code was written 20 years ago, so definitely not something in this code that's changed!)

tianon commented 2 months ago

A million years of compiling later, and here's a better backtrace:

+ gdb -core=rootfs/core -silent -ex bt full -ex quit busybox_unstripped
Reading symbols from busybox_unstripped...

warning: core file may not match specified executable file.
[New LWP 11]
Core was generated by `nslookup google.com'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  __GI_memset (dstpp=dstpp@entry=0x3ffc0d1530, c=c@entry=0, 
    len=<optimized out>, len@entry=16777216)
    at libc/string/generic/memset.c:50
50        ((op_t *) dstp)[0] = cccc;
#0  __GI_memset (dstpp=dstpp@entry=0x3ffc0d1530, c=c@entry=0, 
    len=<optimized out>, len@entry=16777216)
    at libc/string/generic/memset.c:50
        xlen = <optimized out>
        cccc = 0
        dstp = 274811655472
#1  0x000000000010c220 in __poll_nocancel (fds=fds@entry=0x3ffd0d1628, 
    nfds=nfds@entry=1, timeout=<optimized out>)
    at libc/sysdeps/linux/common/poll.c:70
        max_fd_size = 1073741816
        tv = {tv_sec = 16, tv_usec = 2}
        rset = 0x3ffc0d1530
        wset = 0x3ffb0d1530
        xset = <optimized out>
        f = <optimized out>
        ready = <optimized out>
        error_num = <optimized out>
        maxfd = 0
        bytes = 16777216
#2  0x000000000010c740 in __GI_poll (fds=fds@entry=0x3ffd0d1628, 
    nfds=nfds@entry=1, timeout=timeout@entry=2500)
    at libc/sysdeps/linux/common/poll.c:215
        oldtype = <optimized out>
        result = <optimized out>
#3  0x0000000000047d68 in send_queries (ns=0x19e80e70)
    at networking/nslookup.c:569
        qn = <optimized out>
        recvlen = <optimized out>
        reply = "X\004\023\000\000\000\000\000h\004\023\000\000\000\000\000X\004\023\000\000\000\000\000\374\375\347\031", '\000' <repeats 12 times>, "\001", '\000' <repeats 15 times>, "\001", '\000' <repeats 16 times>, "\027\r\375?\000\000\000`r\027\000\000\000\000\000\024\235\022", '\000' <repeats 13 times>, "t\004\023\000\000\000\000\000X\004\023\000\000\000\000\000h\004\023\000\000\000\000\000X\004\023\000\000\000\000\000\374\375\347\031", '\000' <repeats 28 times>, "\001", '\000' <repeats 16 times>, "\027\r\375?\000\000\000`r\027\000\000\000\000\000"...
        rcode = <optimized out>
        local_lsa = 0x19e80850
        pfd = {fd = 3, events = 1, revents = 0}
        servfail_retry = 4
        n_replies = 0
        retry_interval = <optimized out>
        timeout = 5000
        tstart = 97021078
        tsent = 97021078
        tcur = 97021078
#4  0x00000000000484fc in nslookup_main (argc=<optimized out>, 
    argv=<optimized out>, argv@entry=0x3ffd0d1c98)
    at networking/nslookup.c:984
        c = <optimized out>
        types = 0
        rc = 0
        err = <optimized out>
#5  0x0000000000010b58 in run_applet_no_and_exit (applet_no=<optimized out>, 
    name=name@entry=0x3ffd0d1ed1 "nslookup", argv=argv@entry=0x3ffd0d1c98)
    at libbb/appletlib.c:969
        argc = <optimized out>
#6  0x0000000000010f28 in run_applet_and_exit (name=0x3ffd0d1ed1 "nslookup", 
    argv=argv@entry=0x3ffd0d1c98) at libbb/appletlib.c:988
        applet = <optimized out>
#7  0x0000000000010fc0 in main (argc=<optimized out>, argv=0x3ffd0d1c98)
    at libbb/appletlib.c:1128
No locals.
tianon commented 2 months ago

Upgrading buildroot to the new release didn't help either:

+ gdb -core=rootfs/core -silent -ex bt full -ex quit busybox_unstripped
Reading symbols from busybox_unstripped...

warning: core file may not match specified executable file.
[New LWP 11]
Core was generated by `nslookup google.com'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  __GI_memset (dstpp=dstpp@entry=0x3fed186520, c=c@entry=0, 
    len=<optimized out>, len@entry=16777216)
    at libc/string/generic/memset.c:50
50        ((op_t *) dstp)[0] = cccc;
#0  __GI_memset (dstpp=dstpp@entry=0x3fed186520, c=c@entry=0, 
    len=<optimized out>, len@entry=16777216)
    at libc/string/generic/memset.c:50
        xlen = <optimized out>
        cccc = 0
        dstp = 274560738592
#1  0x0000000000109924 in __poll_nocancel (fds=fds@entry=0x3fee186608, 
    nfds=nfds@entry=1, timeout=timeout@entry=2500)
    at libc/sysdeps/linux/common/poll.c:70
        max_fd_size = 1073741816
        tv = {tv_sec = 16, tv_usec = 3}
        rset = 0x3fed186520
        wset = 0x3fec186520
        xset = <optimized out>
        f = <optimized out>
        ready = <optimized out>
        error_num = <optimized out>
        maxfd = 0
        bytes = 16777216
#2  0x0000000000109e44 in __GI_poll (fds=fds@entry=0x3fee186608, 
    nfds=nfds@entry=1, timeout=timeout@entry=2500)
    at libc/sysdeps/linux/common/poll.c:215
        oldtype = <optimized out>
        result = <optimized out>
#3  0x0000000000047628 in send_queries (ns=0x2581e90)
    at networking/nslookup.c:569
        qn = <optimized out>
        recvlen = <optimized out>
        reply = "\234\327\022\000\000\000\000\000\254\327\022\000\000\000\000\000\234\327\022\000\000\000\000\000\034\016X\002", '\000' <repeats 12 times>, "\001", '\000' <repeats 15 times>, "\001", '\000' <repeats 15 times>, "\340f\030\356?\000\000\000\310\261\027\000\000\000\000\000\260~\022", '\000' <repeats 13 times>, "\270\327\022\000\000\000\000\000\234\327\022\000\000\000\000\000\254\327\022\000\000\000\000\000\234\327\022\000\000\000\000\000\034\016X\002", '\000' <repeats 28 times>, "\001", '\000' <repeats 15 times>, "\340f\030\356?\000\000\000\310\261\027\000\000\000\000\000"...
        rcode = 0 '\000'
        local_lsa = 0x2581870
        pfd = {fd = 3, events = 1, revents = 0}
        servfail_retry = 4
        n_replies = 0
        retry_interval = <optimized out>
        timeout = 5000
        tstart = 183524293
        tsent = 183524293
        tcur = 183524293
#4  0x0000000000047d50 in nslookup_main (argc=<optimized out>, 
    argv=<optimized out>, argv@entry=0x3fee186c98)
    at networking/nslookup.c:984
        c = <optimized out>
        types = 0
        rc = 0
        err = <optimized out>
#5  0x0000000000010b14 in run_applet_no_and_exit (applet_no=<optimized out>, 
    name=name@entry=0x3fee186ed3 "nslookup", argv=argv@entry=0x3fee186c98)
    at libbb/appletlib.c:969
        argc = <optimized out>
#6  0x0000000000010ee4 in run_applet_and_exit (name=0x3fee186ed3 "nslookup", 
    argv=argv@entry=0x3fee186c98) at libbb/appletlib.c:988
        applet = <optimized out>
#7  0x0000000000010f7c in main (argc=<optimized out>, argv=0x3fee186c98)
    at libbb/appletlib.c:1128
No locals.
tianon commented 2 months ago

I'm not sure what to do about this segfault -- it seems wrong to revert #201 completely just because it segfaults on one architecture (and only for the uclibc variant), and only segfaults on our real hardware and not on emulated hardware, which is also bizarre.