docker-library / busybox

Docker Official Image packaging for Busybox
http://busybox.net
388 stars 126 forks source link

busybox 1.36 sha256sum crashes with Illegal instruction (SIGILL) on amd64 #166

Closed actualben closed 1 year ago

actualben commented 1 year ago

I have been able to reproduce sha256sum crashes in all three of these images: busybox:1.36-{musl,uclibc,glibc} but not in earlier versions -- and I tested back to 1.33. I can produce the crash by running in github actions jobs on github hosted runners - I cannot reproduce this on arm nor in qemu emulating amd64. The crash is only intermittent for me.

An example of a crashy (on github actions) Dockerfile is this:

ARG BASE=busybox:musl
FROM ${BASE}

# see: https://fastest.fish/test-files
COPY 1MiB.bin 1.544MiB.bin sums.txt /

RUN set -eux; \
  ulimit -c unlimited; \
  ! sha256sum *.bin
  # the goal to is get crash cores, if we didn't crash then we fail the build
RUN set -eux; \
  cat /proc/cpuinfo >/cpuinfo.txt; \
  cat /proc/meminfo >/meminfo.txt; \
  busybox | grep 'BusyBox v' >/busybox-version.txt; \
  cksum bin/busybox >/busybox-cksum.txt

strace looks like this:

2023-02-28T21:47:29.2574112Z + strace sha256sum 1.544MiB.bin 1MiB.bin
2023-02-28T21:47:29.2611500Z execve("/usr/bin/sha256sum", ["sha256sum", "1.544MiB.bin", "1MiB.bin"], 0x7ffc41e4baf0 /* 5 vars */) = 0
2023-02-28T21:47:29.2616858Z arch_prctl(ARCH_SET_FS, 0x51c258)       = 0
2023-02-28T21:47:29.2617173Z set_tid_address(0x51cbd8)               = 10
2023-02-28T21:47:29.2617428Z getuid()                                = 0
2023-02-28T21:47:29.2617682Z brk(NULL)                               = 0x1c4d000
2023-02-28T21:47:29.2623375Z brk(0x1c4f000)                          = 0x1c4f000
2023-02-28T21:47:29.2623919Z mmap(0x1c4d000, 4096, PROT_NONE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x1c4d000
2023-02-28T21:47:29.2625076Z mmap(NULL, 16384, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f2a9de51000
2023-02-28T21:47:29.2625396Z open("1.544MiB.bin", O_RDONLY|O_LARGEFILE) = 3
2023-02-28T21:47:29.2625758Z read(3, "\270\222\2W\262\203&^\1\304X\372\247H\6\261\212\220\303i0D\266tx\356\353\370\327\363\354q"..., 4096) = 4096
2023-02-28T21:47:29.2626698Z --- SIGILL {si_signo=SIGILL, si_code=ILL_ILLOPN, si_addr=0x401161} ---
2023-02-28T21:47:29.2632505Z +++ killed by SIGILL (core dumped) +++
2023-02-28T21:47:29.2632762Z Illegal instruction

gdb on the core dump starts with:

# gdb /bin/sha256sum core-sha256sum.10.1677605036
Reading symbols from /bin/sha256sum...
(No debugging symbols found in /bin/sha256sum)
[New LWP 10]
Core was generated by `sha256sum -w -s -c -'.
Program terminated with signal SIGILL, Illegal instruction.
#0  0x0000000000401161 in ?? ()
actualben commented 1 year ago

Helpful people on the busybox mailing list report that @ncopa previously reported this months ago: http://lists.busybox.net/pipermail/busybox/2023-January/090113.html.

alice reports:

it's caused by having a cpu with AVX512 (the github runners do) but not sha_ni, and the code that checks it is broken and misdetects sha_ni support when avx512 exists. the github runners don't have sha_ni, so it breaks exactly there.

quite the rare combo :)

@ncopa said:

I did try to create a fix for it: https://github.com/ncopa/busybox/commit/e4ad5e7f2fed8e36d0779d918052169fe9a0bb95 But it didn't work. I was unable to create a proper core dump and sort of gave up. In Alpine we have simply disabled the HWACCEL as it is broken.

So I think we should also disable HWACCEL in the docker library busybox until a proper fix can be committed upstream.

actualben commented 1 year ago

Alice's patch disabling HWACCEL in alpine's busybox is here: https://git.alpinelinux.org/aports/commit/main?id=ae2cfdf6f6da3dc46ee09d8ceafa26921f6e058e

tianon commented 1 year ago

See https://bugs.busybox.net/show_bug.cgi?id=15236 for where this was also reported in the upstream bugtracker.