arsv / perl-cross

configure and cross-compile perl
Other
81 stars 28 forks source link

heisenbug build error in `Updating 'mktables.lst'` #113

Closed rofl0r closed 3 years ago

rofl0r commented 3 years ago

hi, while compiling for i486-linux-musl on x86_64-linux-musl host, 3 times out of 5 builds with make -j16 on an AMD ryzen, i get the following error:

Processing IndicSyllabicCategory.txt
Processing BidiBrackets.txt
Processing IndicPositionalCategory.txt
Processing VerticalOrientation.txt
Processing EquivalentUnifiedIdeograph.txt
Processing emoji/emoji.txt
Processing IdStatus.txt
Processing IdType.txt
Finishing processing Unicode properties
Compiling Perl properties
Creating Perl synonyms
Writing tables
Making pod file
Making test script
Updating 'mktables.lst'
make[1]: *** wait: No child process.  Stop.
make: *** [all] Error 2

1 build out of the 5 succeeded, the 5th one crashed during make install as follows:

Generating a Unix-style Makefile
Writing Makefile for threads::shared
./miniperl_top -f pod/buildtoc -q
./miniperl_top installman --destdir=/home/ubuntu/x-prefix/i486/opt/perl 
WARNING: You've never run 'make test'!!!  (Installing anyway.)
make[1]: *** [install.man] Segmentation fault (core dumped)
make[1]: Leaving directory `/home/ubuntu/x-prefix/i486/src/build/perl/perl-5.34.0'
make: *** [install] Error 2

strace showing the error happening here:

26566 execve("./miniperl_top", ["./miniperl_top", "installman", "--destdir=/home/ubuntu/x-prefix/i486/opt/perl"], [/* 22 vars */] <unfinished ...>
...
26566 --- SIGSEGV {si_signo=SIGSEGV, si_code=SEGV_MAPERR, si_addr=0xffffffff87fc75ef} ---
26566 +++ killed by SIGSEGV (core dumped) +++

i'm currently trying to find the full command line to run gdb on it (hopefully miniperl_top was built with -g ...) EDIT: unfortunately, it doesn't crash under gdb :(

rofl0r commented 3 years ago

the last error could actually be circumvented here, if there's an option not to install manpages, which i delete later in my build script anyway.

rofl0r commented 3 years ago

after like 10 runs in gdb, i get a backtrace for the above:

#0  0x00007ffff7db9749 in memcmp () from /lib/ld-musl-x86_64.so.1
#1  0x00000000005f932b in Perl_rninstr ()
#2  0x000000000057289f in Perl_regexec_flags ()
#3  0x00000000004c5944 in Perl_pp_subst ()
#4  0x000000000060cfd5 in Perl_runops_standard ()
#5  0x00000000006d05a3 in S_run_body ()
#6  0x00000000006d019f in perl_run ()
#7  0x000000000040ad82 in main ()

(gdb) disas
Dump of assembler code for function memcmp:
   0x00007ffff7db9742 <+0>:     xor    %ecx,%ecx
   0x00007ffff7db9744 <+2>:     cmp    %rcx,%rdx
   0x00007ffff7db9747 <+5>:     je     0x7ffff7db975f <memcmp+29>
=> 0x00007ffff7db9749 <+7>:     movzbl (%rdi,%rcx,1),%eax
   0x00007ffff7db974d <+11>:    inc    %rcx

rsi            0x103fe20        17038880
rdi            0xfffffffff7ffe4c5       -134224699
(gdb) x/s 0x103fe20
0x103fe20:      "\\-"

the second pointer is clearly invalid

arsv commented 3 years ago

miniperl config does not depend on the target (i486), only on the build host. Try building natively with perl-cross, it should crash the same way. If that works, try configuring with perl's own Configure, and compare config.sh values between perl-cross and mainline perl. Building mainline perl natively might be a good idea as well, just to make sure the crashes are due to perl-cross and not something else.

I would be looking for a bad value in config.sh first, then for compiler flags (optimization etc) and toolchain related issues in general.

rofl0r commented 3 years ago

thanks, i will try that. do you also have some input on the first issue (mentioned in title, first pasted block)?

arsv commented 3 years ago

No idea. The process in question is most likely miniperl, so it might be one symptom of miniperl being badly broken.

I cannot readily think of any way for miniperl to cause that, but assuming the rest of the system works and make doesn't fail randomly building other things, I'd guess it's probably miniperl.

rofl0r commented 3 years ago

i figured out what happened, there were some implicit declarations (e.g. of memchr) without adding manually -D_GNU_SOURCE to HOSTCFLAGS, which caused warnings about pointer-to-int truncation, and subsequently a segfault, that happened before the error mentioned in title. with that added, everything seems to work well. thanks for your help!