libtcc1.a: abort, bad checksum

nickalcock commented 2 years ago

Seen with trunk in my first attempt to do a live-bootstrap with this package. 64-bit x86-64 box, building with bwrap via:

PATH=/usr/src/live-bootstrap/bwrap:$PATH ./rootfs.py --bwrap

(I have to point PATH through a directory that contains a non-setuid bwrap because the setuid one refuses to allow CAP_SETPCAP wrapping.)

Here's the end of the bootstrap process, including at least one thing that had a correctly-validated checksum:

 +> cp boot3-tcc /usr/bin 
 +> chmod 755 /usr/bin/boot3-tcc 
 +> cd ../mes-aa5f1533e1736a89e60d2c34c2a0ab3b01f8d037 
 +> boot3-tcc -c -D HAVE_CONFIG_H=1 -I include -I include/linux/x86 -o /usr/lib/mes/crt1.o lib/linux/x86-mes-gcc/crt1.c 
 +> boot3-tcc -c -D HAVE_CONFIG_H=1 -I include -I include/linux/x86 -o /usr/lib/mes/crtn.o lib/linux/x86-mes-gcc/crtn.c 
 +> boot3-tcc -c -D HAVE_CONFIG_H=1 -I include -I include/linux/x86 -o /usr/lib/mes/crti.o lib/linux/x86-mes-gcc/crti.c 
 +> boot3-tcc -c -D HAVE_CONFIG_H=1 -D HAVE_FLOAT=1 -D HAVE_LONG_LONG=1 -I include -I include/linux/x86 lib/libtcc1.c 
 +> boot3-tcc -c -D TCC_TARGET_I386=1 ../tcc-0.9.26-1136-g5bba73cc/lib/libtcc1.c 
 +> boot3-tcc -ar cr /usr/lib/mes/tcc/libtcc1.a libtcc1.o 
 +> boot3-tcc -c -D HAVE_CONFIG_H=1 -I include -I include/linux/x86 -o unified-libc.o unified-libc.c 
unified-libc.c:2000: warning: assignment makes integer from pointer without a cast
unified-libc.c:2000: warning: assignment makes pointer from integer without a cast
unified-libc.c:2284: warning: SYS_exit redefined
unified-libc.c:2502: warning: SYS_write redefined
unified-libc.c:5877: warning: assignment from incompatible pointer type
unified-libc.c:6767: warning: assignment from incompatible pointer type
 +> boot3-tcc -ar cr /usr/lib/mes/libc.a unified-libc.o 
 +> cd ../tcc-0.9.26-1136-g5bba73cc 
 +> boot3-tcc -version 
tcc version 0.9.26 (i386 Linux)
 +> boot3-tcc -g -v -static -o boot4-tcc -D BOOTSTRAP=1 -D HAVE_BITFIELD=1 -D HAVE_FLOAT=1 -D HAVE_LONG_LONG=1 -D HAVE_SETJMP=1 -I . -I /usr/include -D TCC_TARGET_I386=1 -D CONFIG_TCCDIR="/usr/lib/mes/tcc" -D CONFIG_TCC_CRTPREFIX="/usr/lib/mes" -D CONFIG_TCC_ELFINTERP="/mes/loader" -D CONFIG_TCC_LIBPATHS="/usr/lib/mes:/usr/lib/mes/tcc" -D CONFIG_TCC_SYSINCLUDEPATHS="/usr/include" -D TCC_LIBGCC="/usr/lib/mes/libc.a" -D TCC_LIBTCC1="libtcc1.a" -D CONFIG_TCCBOOT=1 -D CONFIG_TCC_STATIC=1 -D CONFIG_USE_LIBGCC=1 -D TCC_MES_LIBC=1 -D TCC_VERSION="0.9.26" -D ONE_SOURCE=1 -L . tcc.c 
tcc version 0.9.26 (i386 Linux)
-> tcc.c
<- boot4-tcc
 +> cp boot4-tcc /usr/bin 
 +> chmod 755 /usr/bin/boot4-tcc 
 +> cd ../mes-aa5f1533e1736a89e60d2c34c2a0ab3b01f8d037 
 +> boot4-tcc -c -D HAVE_CONFIG_H=1 -I include -I include/linux/x86 -o /usr/lib/mes/crt1.o lib/linux/x86-mes-gcc/crt1.c 
 +> boot4-tcc -c -D HAVE_CONFIG_H=1 -I include -I include/linux/x86 -o /usr/lib/mes/crtn.o lib/linux/x86-mes-gcc/crtn.c 
 +> boot4-tcc -c -D HAVE_CONFIG_H=1 -I include -I include/linux/x86 -o /usr/lib/mes/crti.o lib/linux/x86-mes-gcc/crti.c 
 +> boot4-tcc -c -D HAVE_CONFIG_H=1 -D HAVE_FLOAT=1 -D HAVE_LONG_LONG=1 -I include -I include/linux/x86 lib/libtcc1.c 
 +> boot4-tcc -c -D TCC_TARGET_I386=1 ../tcc-0.9.26-1136-g5bba73cc/lib/libtcc1.c 
 +> boot4-tcc -ar cr /usr/lib/mes/tcc/libtcc1.a libtcc1.o 
 +> boot4-tcc -c -D HAVE_CONFIG_H=1 -I include -I include/linux/x86 -o unified-libc.o unified-libc.c 
unified-libc.c:2000: warning: assignment makes integer from pointer without a cast
unified-libc.c:2000: warning: assignment makes pointer from integer without a cast
unified-libc.c:2284: warning: SYS_exit redefined
unified-libc.c:2502: warning: SYS_write redefined
unified-libc.c:5877: warning: assignment from incompatible pointer type
unified-libc.c:6767: warning: assignment from incompatible pointer type
 +> boot4-tcc -ar cr /usr/lib/mes/libc.a unified-libc.o 
 +> cd ../tcc-0.9.26-1136-g5bba73cc 
 +> boot4-tcc -version 
tcc version 0.9.26 (i386 Linux)
 +> boot4-tcc -g -v -static -o boot5-tcc -D BOOTSTRAP=1 -D HAVE_BITFIELD=1 -D HAVE_FLOAT=1 -D HAVE_LONG_LONG=1 -D HAVE_SETJMP=1 -I . -I /usr/include -D TCC_TARGET_I386=1 -D CONFIG_TCCDIR="/usr/lib/mes/tcc" -D CONFIG_TCC_CRTPREFIX="/usr/lib/mes" -D CONFIG_TCC_ELFINTERP="/mes/loader" -D CONFIG_TCC_LIBPATHS="/usr/lib/mes:/usr/lib/mes/tcc" -D CONFIG_TCC_SYSINCLUDEPATHS="/usr/include" -D TCC_LIBGCC="/usr/lib/mes/libc.a" -D TCC_LIBTCC1="libtcc1.a" -D CONFIG_TCCBOOT=1 -D CONFIG_TCC_STATIC=1 -D CONFIG_USE_LIBGCC=1 -D TCC_MES_LIBC=1 -D TCC_VERSION="0.9.26" -D ONE_SOURCE=1 -L . tcc.c 
tcc version 0.9.26 (i386 Linux)
-> tcc.c
<- boot5-tcc
 +> cp boot5-tcc /usr/bin 
 +> chmod 755 /usr/bin/boot5-tcc 
 +> cd ../mes-aa5f1533e1736a89e60d2c34c2a0ab3b01f8d037 
 +> boot5-tcc -c -D HAVE_CONFIG_H=1 -I include -I include/linux/x86 -o /usr/lib/mes/crt1.o lib/linux/x86-mes-gcc/crt1.c 
 +> boot5-tcc -c -D HAVE_CONFIG_H=1 -I include -I include/linux/x86 -o /usr/lib/mes/crtn.o lib/linux/x86-mes-gcc/crtn.c 
 +> boot5-tcc -c -D HAVE_CONFIG_H=1 -I include -I include/linux/x86 -o /usr/lib/mes/crti.o lib/linux/x86-mes-gcc/crti.c 
 +> boot5-tcc -c -D HAVE_CONFIG_H=1 -D HAVE_FLOAT=1 -D HAVE_LONG_LONG=1 -I include -I include/linux/x86 lib/libtcc1.c 
 +> boot5-tcc -c -D TCC_TARGET_I386=1 ../tcc-0.9.26-1136-g5bba73cc/lib/libtcc1.c 
 +> boot5-tcc -ar cr /usr/lib/mes/tcc/libtcc1.a libtcc1.o 
 +> boot5-tcc -c -D HAVE_CONFIG_H=1 -I include -I include/linux/x86 -o unified-libc.o unified-libc.c 
unified-libc.c:2000: warning: assignment makes integer from pointer without a cast
unified-libc.c:2000: warning: assignment makes pointer from integer without a cast
unified-libc.c:2284: warning: SYS_exit redefined
unified-libc.c:2502: warning: SYS_write redefined
unified-libc.c:5877: warning: assignment from incompatible pointer type
unified-libc.c:6767: warning: assignment from incompatible pointer type
 +> boot5-tcc -ar cr /usr/lib/mes/libc.a unified-libc.o 
 +> boot5-tcc -version 
tcc version 0.9.26 (i386 Linux)
 +> cp /usr/bin/boot5-tcc /usr/bin/tcc 
 +> chmod 755 /usr/bin/tcc 
 +> cp /usr/bin/tcc /usr/bin/tcc-0.9.26 
 +> chmod 755 /usr/bin/tcc-0.9.26 
 +> tcc -c -D HAVE_CONFIG_H=1 -I include -I include/linux/x86 lib/posix/getopt.c 
 +> tcc -ar cr /usr/lib/mes/libgetopt.a getopt.o 
 +> cd ../.. 
 +> if match xFalse xTrue 
/usr/bin/mes-tcc: OK
/usr/bin/boot0-tcc: OK
/usr/bin/boot1-tcc: OK
/usr/bin/boot2-tcc: OK
/usr/bin/boot3-tcc: OK
/usr/bin/boot4-tcc: OK
/usr/bin/tcc: OK
/usr/lib/mes/libc.a: OK
/usr/lib/mes/libgetopt.a: OK
/usr/lib/mes/crt1.o: OK
/usr/lib/mes/crti.o: OK
/usr/lib/mes/crtn.o: OK
/usr/lib/mes/tcc/libtcc1.a: FAILED
Wanted:   ac11f09698f092ed76ae40ebcd56cf3f2b903ea1333ef7537a00673dd6f73da7
Received: 2adce9b440aefc6dd458b046582f3584de224dc2e0cf3cf1c0c17e98182beaa5
Subprocess error 1
ABORTING HARD
Subprocess error 1
ABORTING HARD
Subprocess error 1
ABORTING HARD
Subprocess error
ABORTING HARD
Bootstrapping failed

I don't know where to start debugging this because I don't have an instance that works to work from. Clearly codegen is broken, but where? (I can provide the probably-broken binaries to anyone who wants them.)

stikonas commented 2 years ago

Yes, it would be good to compare binaries with good ones. diffoscope can often show what's wrong.

Do you have a root on some box? Then you could try to also see if chroot and qemu modes work.

P.S. also feel free to visit #bootstrappable on libera.chat if you want more interactive help.

nickalcock commented 2 years ago

Exactly. I'll see if I can generate good binaries via qemu, though I'd be surprised if that would work: I do wonder if the problem is file sort order or something, i.e. down to the underlying filesystem, xfs versus qemu? hmm that's easy to test, will do. (I have root across this local network, so that should be good enough. It looks like qemu mode doesn't do anything dangerous or crazy. Well, more dangerous and crazy than this project as a whole :) )

nickalcock commented 2 years ago

Confirmed that it only goes wrong under --bwrap. Still trying to figure out where --qemu mode writes to so I can diffoscope the artifacts: it's not writing to tmp or sysc/tmp or even sysc/tmp/disk.img even with the tmpfs mounting forcibly disabled.

stikonas commented 2 years ago

Confirmed that it only goes wrong under --bwrap. Still trying to figure out where --qemu mode writes to so I can diffoscope the artifacts: it's not writing to tmp or sysc/tmp or even sysc/tmp/disk.img even with the tmpfs mounting forcibly disabled.

qemu mode runs in tmpfs during sysa stage and later on the virtual disk in sysc stage, so getting artifacts out is a bit tricky. You would have to transfer them to sysc first... Perhaps it would be easier if I publish good file somewhere.

What about chroot mode? Does that give your correct checksum?

But it might indeed be related to underlying filesystem...

stikonas commented 2 years ago

https://stikonas.eu/files/bootstrap/libtcc1.a

nickalcock commented 2 years ago

A bit more info: not creating a tmpfs in the bwrap stage makes the error go away! (to be replaced by another error, which isn't too surprising after I did that). So this must be a difference in the behaviour of tmpfs between the sysa qemu image (which for me is based on a 5.10.0 defconfig kernel) and the host kernel (5.16.19 tmpfs, 64-bit). I'll arrange to copy the file in question off the tmpfs before deleting it... let's see.

nickalcock commented 2 years ago

Failure confirmed intermittent, happening about 50% of the time. The difference is that my faulty copy of libtcc1.a has four more null bytes at the end. This seems to be pure padding: it's not represented in the size of the archive's lone element at all. This almost has to be something up with tcc 0.9.26's tcc_tool_ar, I'd think.

stikonas commented 2 years ago

Hmm, that is strange. I was expecting something like ordering issue and not padding.

Perhaps another useful data point would be to check if that happens with all libtcc1.a stages. Unfortunately, we only checksum the last one but mescc->tcc-0.9.26 step actually involves 5 rebuilds.

nickalcock commented 2 years ago

Good idea: I can use the same "stuff a cp into sysc_image/tmp" kludge I used for this to smuggle all five out in both the failing build and the qemu build. I'll look at it once this stupid cold has gone away :(

nanonyme commented 11 months ago

@nickalcock does this still reproduce?

fosslinux / live-bootstrap

libtcc1.a: abort, bad checksum #205