Open he32 opened 1 year ago
Thanks for the report. First step seems to try using version 9.0.4. Lots of issues where fixed since. If that still crashes try using the debug build (CMAKE_BUILD_TYPE=Debug
, see CMAKE.md
).
Thanks for the hint. This uncovers a new (undocumented?) dependency:
/usr/pkgsrc/lang/swi-prolog-lite/work/swipl-9.0.4/packages/sweep/sweep.c:33:10: fatal error: emacs-module.h: No such file or directory
33 | #include <emacs-module.h>
| ^~~~~~~~~~~~~~~~
compilation terminated.
I'll rummage around and see where I can satisfy it.
Part of GNU emacs. If this file does not exist however, it should simply not try to compile the sweep package. See packages/sweep/CMakeLists.txt
Yes. Part of the problem is I'm trying to fit this into NetBSD's pkgsrc, and even though I have emacs installed, and cmake possibly detects that, the actual build happens in a restricted environment where the required packages have to be "properly declared". Trying to sort out what the magic is. For now I've hacked the given CMakeLists.txt file to get around this hurdle for now.
It looks like this bug is also present in 9.0.4, ref:
(gdb) run -f none --no-packs -x /usr/pkgsrc/lang/swi-prolog-lite/work/swipl-9.0.4/build/man/pldoc2tex -- --source=/usr/pkgsrc/lang/swi-prolog-lite/work/swipl-9.0.4/man --out=lib/clpfdlib.tex --lib=clpfd --module=clpfd --summaries lib/clpfdlib.md
Starting program: /usr/pkgsrc/lang/swi-prolog-lite/work/swipl-9.0.4/build/src/swipl -f none --no-packs -x /usr/pkgsrc/lang/swi-prolog-lite/work/swipl-9.0.4/build/man/pldoc2tex -- --source=/usr/pkgsrc/lang/swi-prolog-lite/work/swipl-9.0.4/man --out=lib/clpfdlib.tex --lib=clpfd --module=clpfd --summaries lib/clpfdlib.md
[New LWP 12763 of process 21297]
Thread 1 "" received signal SIGSEGV, Segmentation fault.
0xfdcdad44 in compileArgument___LD (
__PL_ld=__PL_ld@entry=0xfdea4f08 <PL_local_data>, arg=<optimized out>,
arg@entry=0xfc597f50, where=where@entry=1, ci=ci@entry=0xffffd044)
at /usr/pkgsrc/lang/swi-prolog-lite/work/swipl-9.0.4/src/pl-comp.c:2586
2586 Output_n(ci, c, p, n+1);
(gdb) where
#0 0xfdcdad44 in compileArgument___LD (
__PL_ld=__PL_ld@entry=0xfdea4f08 <PL_local_data>, arg=<optimized out>,
arg@entry=0xfc597f50, where=where@entry=1, ci=ci@entry=0xffffd044)
at /usr/pkgsrc/lang/swi-prolog-lite/work/swipl-9.0.4/src/pl-comp.c:2586
#1 0xfdce48e4 in compileClauseGuarded___LD (
__PL_ld=__PL_ld@entry=0xfdea4f08 <PL_local_data>, ci=ci@entry=0xffffd044,
cp=cp@entry=0xffffd1dc, head=head@entry=0xfc5f7b8c,
body=body@entry=0xfc5f7b90, proc=proc@entry=0xfd80d230,
module=module@entry=0xfc9bc410, warnings=warnings@entry=741,
flags=<optimized out>, flags@entry=2)
at /usr/pkgsrc/lang/swi-prolog-lite/work/swipl-9.0.4/src/pl-comp.c:1922
#2 0xfdce572c in compileClause___LD (__PL_ld=0xfdea4f08 <PL_local_data>,
cp=0xffffd1dc, head=0xfc5f7b8c, body=0xfc5f7b90, proc=0xfd80d230,
module=0xfc9bc410, warnings=741, flags=2)
at /usr/pkgsrc/lang/swi-prolog-lite/work/swipl-9.0.4/src/pl-comp.c:1827
#3 0x6973206c in ?? ()
Backtrace stopped: previous frame inner to this frame (corrupt stack?)
(gdb) p ci
$1 = (compileInfo *) 0xffffd044
(gdb) p c
$2 = <optimized out>
(gdb) p p
$3 = (Word) 0xfc4f6464
(gdb) p n
$4 = 7196
(gdb) i reg
r0 0xfdcdacd0 4258114768
r1 0xffffcf10 4294954768
r2 0xfdee1008 4260237320
r3 0xffffd044 4294955076
r4 0xfdea4b00 4259990272
r5 0x1 1
r6 0x41d8 16856
r7 0x2c20616e 740319598
r8 0xfffff000 4294963200
r9 0xfffff004 4294963204
r10 0xfc4f9304 4233073412
r11 0xfdcaa100 4257915136
r12 0x48028448 1208124488
r13 0x1828030 25329712
r14 0x6808d 426125
r15 0x4 4
r16 0x4 4
r17 0xfdea4f08 4259991304
r18 0xfdc77edc 4257709788
r19 0xffffd1dc 4294955484
r20 0xfdea4f28 4259991336
r21 0x1 1
r22 0xffffd044 4294955076
r23 0xffffcf28 4294954792
r24 0x1 1
r25 0xffffcf24 4294954788
r26 0xffffd11c 4294955292
r27 0xfdc77edc 4257709788
r28 0x7074 28788
r29 0xfc4f6464 4233061476
r30 0xfdea5730 4259993392
r31 0x1c1d 7197
pc 0xfdcdad44 0xfdcdad44 <compileArgument___LD+404>
msr <unavailable>
cr 0x48028448 1208124488
lr 0xfdcdacd0 0xfdcdacd0 <compileArgument___LD+288>
ctr 0xa3b 2619
xer 0x20000000 536870912
fpscr 0xfff80000 -524288
vscr <unavailable>
vrsave <unavailable>
(gdb) x/i 0xfdcdad44
=> 0xfdcdad44 <compileArgument___LD+404>: stw r7,-4(r9)
(gdb) p/x 0xfffff004-4
$5 = 0xfffff000
(gdb) x/x 0xfffff000
0xfffff000: Cannot access memory at address 0xfffff000
(gdb) x/x 0xffffefff
0xffffefff: Cannot access memory at address 0xfffff000
(gdb) x/x 0xffffeff0
0xffffeff0: 0x20657870
(gdb) x/x 0xffffeffc
0xffffeffc: 0x0a757365
(gdb)
Hmm. Doesn't make much sense to me. This code has been in use for many years with many different compilers and executed using valgrind, AddressSanitizer, etc. I'd first try to build using the Debug
built type. At least that should give more details (hoping the problem persists).
CMAKE_BUILD_TYPE=Debug
Did that, and, annoyingly, this doesn't reproduce the problem. I think this means that this went from a swi-prolog issue to being a gcc compiler issue for this platform (32-bit ppc), with gcc version 10.3.0 (nb1 20210411).
Not sure that made it that much easier for me, but probably good news at your end.
Or... Maybe not. I built swipl
with clang
version 15.0.7, and got the same problem:
[ 81%] Generating lib/clpfdlib.tex
cd /usr/pkgsrc/lang/swi-prolog-lite/work/swipl-9.0.4/build/man && ../src/swipl -f none --no-packs -x /usr/pkgsrc/lang/swi-prolog-lite/work/swipl-9.0.4/build/man/pldoc2tex -- --source=/usr/pkgsrc/lang/swi-prolog-lite/work/swipl-9.0.4/man --out=lib/clpfdlib.tex --lib=clpfd --module=clpfd --summaries lib/clpfdlib.md
[1] Segmentation fault ../src/swipl -f none --no-packs -x /usr/pkgsrc...
*** [man/lib/clpfdlib.tex] Error code 139
Need to tweak the build to produce debug info by default, and re-do, to verify that it bombs at the same place.
Gets nasty. Yes, two compilers doing it wrong sounds unlikely. Alignment issues come to mind. On the other hand, we also have arm, which passes for 32 and 64 bits (Linux). Note that we also have Debian which passes all platforms including ppc AFAIK. Debian testing is at gcc 12 though.
I guess some good old print statements may shine some light ... Use Sdprintf() that otherwise has the same signature as printf() (with some extensions) to print debug output.
Please keep me posted.
The debug with clang
build bombed slightly elsewhere:
[ 76%] Build home/library/clp/INDEX.pl
SWI-Prolog [thread 2 (gc) at Mon Jan 30 21:42:21 2023]: received fatal signal 11 (segv)
[1] Segmentation fault (core dumped) ./swipl -f none --no-packs -t halt --home=/usr...
--- home/library/clp/INDEX.pl ---
*** [home/library/clp/INDEX.pl] Error code 139
Will dig out details later.
Here's a bit more about what this is about:
--- src/CMakeFiles/library_index_library_clp.dir/all ---
[ 76%] Build home/library/clp/INDEX.pl
cd /usr/pkgsrc/lang/swi-prolog-lite/work/swipl-9.0.4/build/src && ./swipl -f none --no-packs -t halt --home=/usr/pkgsrc/lang/swi-prolog-lite/work/swipl-9.0.4/build/home -q -g "make_library_index('/usr/pkgsrc/lang/swi-prolog-lite/work/swipl-9.0.4/build/home/library/clp')" --
SWI-Prolog [thread 2 (gc) at Tue Jan 31 18:37:21 2023]: received fatal signal 11 (segv)
--- src/CMakeFiles/library_index_library_clp_always.dir/all ---
--- home/library/clp/__INDEX.pl ---
--- src/CMakeFiles/library_index_library_clp.dir/all ---
[1] Segmentation fault (core dumped) ./swipl -f none --no-packs -t halt --home=/usr...
*** [home/library/clp/INDEX.pl] Error code 139
...
bramley: {309} gdb swipl swipl.core
GNU gdb (GDB) 11.0.50.20200914-git
...
Reading symbols from swipl...
[New process 5599]
[New process 28452]
Core was generated by `swipl'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0 fetchop (PC=0xfd6c3440)
at /usr/pkgsrc/lang/swi-prolog-lite/work/swipl-9.0.4/src/pl-comp.h:93
93 if ( unlikely(op == D_BREAK) )
[Current thread is 1 (process 5599)]
(gdb) i thread
Id Target Id Frame
* 1 process 5599 fetchop (PC=0xfd6c3440)
at /usr/pkgsrc/lang/swi-prolog-lite/work/swipl-9.0.4/src/pl-comp.h:93
2 process 28452 0xfda10ea8 in kill () from /usr/lib/libc.so.12
(gdb) thread 2
[Switching to thread 2 (process 28452)]
#0 0xfda10ea8 in kill () from /usr/lib/libc.so.12
(gdb) where
#0 0xfda10ea8 in kill () from /usr/lib/libc.so.12
#1 0xfde0d538 in sigCrashHandler (sig=11)
at /usr/pkgsrc/lang/swi-prolog-lite/work/swipl-9.0.4/src/os/pl-cstack.c:1081
#2 0xfdd29b6c in alt_segv_handler (sig=11)
at /usr/pkgsrc/lang/swi-prolog-lite/work/swipl-9.0.4/src/pl-setup.c:778
#3 <signal handler called>
#4 lookupHTable___LD (__PL_ld=0xfcec8000, ht=0x0, name=0x19385)
at /usr/pkgsrc/lang/swi-prolog-lite/work/swipl-9.0.4/src/os/pl-table.c:517
#5 0xfdce01b8 in lookupModule___LD (__PL_ld=0xfcec8000, name=103301)
at /usr/pkgsrc/lang/swi-prolog-lite/work/swipl-9.0.4/src/pl-modul.c:139
#6 0xfddc023c in PL_predicate (name=<optimized out>, arity=<optimized out>,
module=0xfde2baf5 "system")
at /usr/pkgsrc/lang/swi-prolog-lite/work/swipl-9.0.4/src/pl-fli.c:4178
#7 0xfddc0164 in _PL_predicate (name=<optimized out>, arity=<optimized out>,
module=<optimized out>, bin=0xfdea4480 <PL_global_data+3192>)
at /usr/pkgsrc/lang/swi-prolog-lite/work/swipl-9.0.4/src/pl-fli.h:299
#8 0xfdd57fb4 in GCmain (closure=<optimized out>)
at /usr/pkgsrc/lang/swi-prolog-lite/work/swipl-9.0.4/src/pl-thread.c:6527
#9 0xfd9ae204 in ?? () from /usr/lib/libpthread.so.1
#10 0xfda7e2b0 in __mknod50 () from /usr/lib/libc.so.12
Backtrace stopped: frame did not save the PC
(gdb) up 4
#4 lookupHTable___LD (__PL_ld=0xfcec8000, ht=0x0, name=0x19385)
at /usr/pkgsrc/lang/swi-prolog-lite/work/swipl-9.0.4/src/os/pl-table.c:517
517 acquire_kvs(ht, kvs);
(gdb) p/x ht
$1 = 0x0
(gdb) up
#5 0xfdce01b8 in lookupModule___LD (__PL_ld=0xfcec8000, name=103301)
at /usr/pkgsrc/lang/swi-prolog-lite/work/swipl-9.0.4/src/pl-modul.c:139
139 if ( (m = lookupHTable(GD->tables.modules, (void*)name)) )
(gdb) p PL_global_data.tables
$2 = {modules = 0x0}
(gdb)
Looks like a "simple" null pointer de-reference, but ... why would the "modules" not be filled with actual data?
Rather odd. Thread 2 is the global object (atom,clause) garbage collector thread. This looks like the startup of this thread, trying to find the predicate to call. The module table is shared and must already have been created long before by the main thread. So, this makes no sense unless something writes to the wrong address. The way forward is probably to run under gdb, set a breakpoint at the place the table is created (pl-modul.c:292 in de dev source) and then a hardware watchpoint on PL_global_data.tables.modules.
Looks like there is something fundamentally wrong with this platform. I vaguely recall there were issues in the POSIX thread implementation of NetBSD that caused problems with SWI-Prolog, but that is rather long ago.
What is the issue in thread 1? Is PC invalid? Or is it decode() which does a lookup in a global array. The == test can't crash :smile: That too is pretty weird for running a simple Prolog program. I don't think I've ever seen fetchop() crash.
I beleive I have news on this. It turns out that since I do this build on a dual-CPU macppc, I also did the build with "make -j 2". However, it now looks like the build setup of swi-prolog has not made all inter-dependencies between the different build steps explicit, so that this is safe.
I have now done a build with gcc 12, which initially crashed, and setting pkgsrc's MAKE_JOBS=1
in /etc/mk.conf
made the build succeed. I've also reverted back to the in-tree gcc 10, and reducing the build job parallelism down to 1 makes that build succeed as well.
Thanks for all the work. Concurrent dependency issues are not that likely to cause this. It doesn't match the crashes and concurrent builds are used on almost all platforms these days. What is more likely is that we are dealing with internal thread issues that become apparent because you run two make jobs, which typically leads to more than 2 threads. No this has also been tested in many extreme scenarios. I'm not claiming there is no bug related to thread synchronization, but this is rather extreme. Especially the lost module table is really weird.
What happens on ctest
, in particularly depending on the number of jobs? That should work pretty much ok, except for some socket allocation issue that causes http:proxy tests to fail occasionally because it can run concurrently with another test on sockets and then the tests sometimes use each other's sockets :cry:
Trying to build swi-prolog on NetBSD/macppc 10.0_BETA fails, with the following process getting a SEGV:
It wasn't particularly easy to find how to configure it for a debug build, but I managed to hack it to produce a debug version eventually. The GDB session reveals that it looks like it's trying to store into unmapped memory(?)