Closed ncaq closed 1 year ago
Thanks for the detailed report. I can't reproduce this with my setup though, so we'll need to investigate some more to understand what's going on. So a couple of questions:
pceResolveImplementation
while Sweep is calling xref_source/2
to analyze the buffer. Do you see a similar crash if you invoke xref_source('.../clp/bounds.pl', [comments(store)])
in the regular swipl
top level?clp/bounds.pl
? A quick look doesn't reveal anything out of the ordinary with this file, so I wonder why you see this crash with this file specifically.emacs -Q -l .../sweeprolog.el .../clp/bounds.pl -f sweeprolog-mode
?Thanks for your reply. If it is not reproduced in your environment, it seems that there is still a problem with the combination with my environment. I would like to help you share my environment condition.
Do I understand correctly that you're running Emacs and SWI-Prolog in WSL?
yes.
It looks like the crash occurs somewhere in the C-function pceResolveImplementation while Sweep is calling xref_source/2 to analyze the buffer. Do you see a similar crash if you invoke xref_source('.../clp/bounds.pl', [comments(store)]) in the regular swipl top level?
yes.
Do you see a similar crash with other Prolog source files? Or is it only clp/bounds.pl? A quick look doesn't reveal anything out of the ordinary with this file, so I wonder why you see this crash with this file specifically.
yes. I first reported it anyway because it crashed with this file. Most Prolog source code does not crash, but a few files do.
Does this also happen with a clean Emacs config? Namely, do you see this crash when you do emacs -Q -l .../sweeprolog.el .../clp/bounds.pl -f sweeprolog-mode?
yes. I put log.
~/Desktop/swipl-devel on master [!?] via △ v3.22.1
2023-08-08T11:21:29 ⬢ [Systemd] ❯ emacs -Q -l ~/.emacs.d/elpa/sweeprolog-0.22.0/sweeprolog.el library/clp/bounds.pl -f sweeprolog-mode
Fatal error 11: Segmentation fault
Backtrace:
emacs(emacs_backtrace+0x55)[0x558f48de6ca5]
emacs(terminate_due_to_signal+0x89)[0x558f48c87687]
emacs(+0x94bd2)[0x558f48c87bd2]
emacs(+0x1f1cad)[0x558f48de4cad]
emacs(+0x1f1d2d)[0x558f48de4d2d]
/lib/x86_64-linux-gnu/libc.so.6(+0x42520)[0x7f0834139520]
/home/ncaq/.local/lib/swipl/lib/x86_64-linux/pl2xpce.so(+0x10f3bc)[0x7f080f2803bc]
/home/ncaq/.local/lib/swipl/lib/x86_64-linux/pl2xpce.so(pceResolveImplementation+0x29)[0x7f080f280819]
/home/ncaq/.local/lib/swipl/lib/x86_64-linux/pl2xpce.so(+0x110dd7)[0x7f080f281dd7]
/home/ncaq/.local/lib/swipl/lib/x86_64-linux/pl2xpce.so(pl_pce_init+0xa8a)[0x7f080f339c6a]
/home/ncaq/.local/lib/swipl/lib/x86_64-linux/libswipl.so.9(+0x3d1b2)[0x7f080f81e1b2]
/home/ncaq/.local/lib/swipl/lib/x86_64-linux/libswipl.so.9(+0x8bc81)[0x7f080f86cc81]
/home/ncaq/.local/lib/swipl/lib/x86_64-linux/libswipl.so.9(+0xb7375)[0x7f080f898375]
/home/ncaq/.local/lib/swipl/lib/x86_64-linux/libswipl.so.9(+0xb761d)[0x7f080f89861d]
/home/ncaq/.local/lib/swipl/lib/x86_64-linux/libswipl.so.9(+0xb720e)[0x7f080f89820e]
/home/ncaq/.local/lib/swipl/lib/x86_64-linux/libswipl.so.9(+0xb761d)[0x7f080f89861d]
/home/ncaq/.local/lib/swipl/lib/x86_64-linux/libswipl.so.9(+0xb7d1d)[0x7f080f898d1d]
/home/ncaq/.local/lib/swipl/lib/x86_64-linux/libswipl.so.9(+0xb7e7e)[0x7f080f898e7e]
/home/ncaq/.local/lib/swipl/lib/x86_64-linux/libswipl.so.9(+0x3a1ba)[0x7f080f81b1ba]
/home/ncaq/.local/lib/swipl/lib/x86_64-linux/libswipl.so.9(+0x8bc81)[0x7f080f86cc81]
/home/ncaq/.local/lib/swipl/lib/x86_64-linux/libswipl.so.9(+0x8be78)[0x7f080f86ce78]
/home/ncaq/.local/lib/swipl/lib/x86_64-linux/libswipl.so.9(+0x3a1ba)[0x7f080f81b1ba]
/home/ncaq/.local/lib/swipl/lib/x86_64-linux/sweep-module.so(sweep_next_solution+0x4f)[0x7f080f7db63f]
emacs(funcall_module+0x12a)[0x558f48e9550a]
emacs(+0x269ede)[0x558f48e5cede]
emacs(eval_sub+0x27b)[0x558f48e5af8b]
emacs(Flet+0x141)[0x558f48e5d2f1]
emacs(eval_sub+0x987)[0x558f48e5b697]
emacs(+0x2698fd)[0x558f48e5c8fd]
emacs(+0x269ede)[0x558f48e5cede]
emacs(eval_sub+0x27b)[0x558f48e5af8b]
emacs(eval_sub+0x987)[0x558f48e5b697]
emacs(FletX+0x29d)[0x558f48e5d95d]
emacs(eval_sub+0x987)[0x558f48e5b697]
emacs(+0x2698fd)[0x558f48e5c8fd]
emacs(+0x269ede)[0x558f48e5cede]
emacs(eval_sub+0x27b)[0x558f48e5af8b]
emacs(Fprogn+0x2d)[0x558f48e5b9fd]
emacs(eval_sub+0x987)[0x558f48e5b697]
emacs(eval_sub+0x987)[0x558f48e5b697]
emacs(+0x2698fd)[0x558f48e5c8fd]
...
zsh: segmentation fault emacs -Q -l ~/.emacs.d/elpa/sweeprolog-0.22.0/sweeprolog.el -f
Do you see a similar crash if you invoke xref_source('.../clp/bounds.pl', [comments(store)]) in the regular swipl top level?
yes.
Hmm so just to be clear, you do get a segfault when swipl
is running standalone, outside of Emacs?
The stack trace looked familiar so I dug a bit and found a very similar crash that @JanWielemaker reported to me privately. We never really got to the bottom of it because the issue went away once Jan installed another version of Emacs. Can you please share how you've installed Emacs and SWI-Prolog?
Emacs is an older version that comes in using Ubuntu's apt, so I built a STABLE version.
The build branch is emacs-29
and the commit hash is ef8838c3a5f041769f72758b831eb3fa7a130fb9
.
Paste the command you used to build.
. /configure --prefix=/usr/local/stow/emacs-29 --with-toolkit-scroll-bars --with-gif --with-jpeg --with-png --with-rsvg --with-tiff --with- xpm --with-imagemagick --with-xft --with-cairo --with-harfbuzz --with-xwidgets --with-out-compress-install --with-out-pop --with-modules with-libgmp --with-native-compilation=aot --with-json --with-xml2 --with-mailutils --with-gnutls --with-threads --with-zlib CFLAGS="-pipe - O2 -fomit-frame-pointer -march=native"
make -j $(nproc)
sudo make install
For SWI-Prolog, since sweep-module.so
does not fit in Ubuntu's apt, I built swi-devel
here too.
The build branch is master
and the commit hash is 7e6f7de02ccb3ba0b45af7ca4900ee8f9d2d81ce
.
Paste the commands used for the build.
cd swipl-devel
mkdir build
cd build
cmake -DCMAKE_INSTALL_PREFIX=$HOME/.local/ -G Ninja .
Ninja
ctest -j 4
ninja install
From https://www.swi-prolog.org/build/unix.html,
The prefix of the cmake
command is modified so that the SWI-Prolog command goes into ~/.local/bin/
.
Hmm so just to be clear, you do get a segfault when swipl is running standalone, outside of Emacs?
oops.
It looks like the crash occurs somewhere in the C-function pceResolveImplementation while Sweep is calling xref_source/2 to analyze the buffer. Do you see a similar crash if you invoke xref_source('.../clp/bounds.pl', [comments(store)]) in the regular swipl top level?
yes.
In regards to this reply, since you wrote top level, I assumed you meant (sweeprolog-top-level)
and was running it in Emacs.
When I ran it from the command line zsh with the swipl
command, it did not crash as shown below.
? - xref_source('library/clp/bounds.pl', [comments(store)]).
true.
Great, I was able to reproduce this crash after building Emacs with the configuration options you provided. I'll investigate further and let you know how it goes.
Alright I'm getting somewhere:
It seems that the crash is caused by some bad interaction between XPCE and Emacs's optional Xwidgets feature.
If you build Emacs without Xwidgets (remove --with-xwidgets
from your ./configure
arguments) everything should work well.
Could you please try that and see if it solves this issue for you?
I'll try to get to the bottom of this collision between Xwidgets and XPCE. If I can't figure it out in the next few days, I'll include a workaround in the next version of Sweep to disable XPCE when Xwidgets are enabled.
I see. That might get tricky. I think we'll have two subsystems trying to connect to X11, where xpce does so from a background thread. That is likely to fail :cry: You can disable xpce from starting a thread using
set_prolog_flag(xpce_threaded, false).
XPCE can also be initialized to use an existing X11 event loop. I'd have to dig a little in my mind to figure that out. If I recall correctly you can both tell it to reuse an existing connection and ask it which connection it uses such that that can be reused by other components.
Thanks Jan, that makes sense.
The workaround I had in mind is to initialize Prolog with --pce=false
if we detect that Emacs is built with Xwidgets. That seems to work fine at the cost of not being able to start XPCE from within the embedded SWI-Prolog. That's not perfect but surely beats getting a segfault.
What do you think?
Of course, if we can make XPCE play together with Xwidgets, that'd be even better :)
I'm actually not so sure it makes sense. The emacs I have is an X11 application making an x11 window and xpce runs just fine in that. I wonder how this is possible. It is emacs 28.2, configured using
./config.status --config
'--with-native-compilation' 'PKG_CONFIG_PATH=/home/janw/lib/pkgconfig'
I was talking about the X11 connection. Also crashing while resolving a method seems to indicate another issue than some X11 interaction. Are there any earlier warnings from Prolog/xpce?
Hmm right, here too I'm running Emacs as an X11 application. It's just Xwidgets that seem to cause problems...
Are there any earlier warnings from Prolog/xpce?
No, none :(
Compile in debug mode and generate a proper stack trace using gdb? Its a long shot, but possibly we are dealing with a symbol-name conflict and xpce picks up some symbol from Xwidgets rather than using its own? You could use LD_PRELOAD to inject xpce2pl.so first into the process? If it is a conflict that probably also causes problems, but most likely elsewhere.
Compile in debug mode and generate a proper stack trace using gdb? Its a long shot, but possibly we are dealing with a symbol-name conflict and xpce picks up some symbol from Xwidgets rather than using its own?
Yeah, gdb
gives some more details: the crash happens at line 348 in swipl-devel/packages/xpce/src/ker/passing.c
.
Namely, this onFlag(obj, F_ACTIVE|F_ATTRIBUTE|F_SENDMETHOD|F_GETMETHOD)
happens to be a NULL
derefernence because obj
is NULL
at this point. That means that the previous answerObject(ClassNumber, obj, EAV)
call, a couple of lines before, returned NULL
.
I'm still not totally sure what to make of it... trying to inject xpce2pl.so
with LD_PRELOAD
didn't cause a crash right away, neither could I get it to crash by playing around with Xwidgets.
Actually, it's not that answerObject
returns NULL
, I wasn't reading the code correctly, sorry about that.
What happens is that resolveImplementationGoal
is called with an argument g
that has a NULL
g->receiver
, and obj
is set to that.
Thread 1 "emacs" hit Breakpoint 1, resolveImplementationGoal (g=0x7ffffffeff90)
at /home/eshel/checkouts/swipl-devel/packages/xpce/src/ker/passing.c:341
341 Any obj = g->receiver;
(gdb) p g->receiver
$12 = (Any) 0x0
(gdb) bt
#0 resolveImplementationGoal (g=0x7ffffffeff90)
at /home/eshel/checkouts/swipl-devel/packages/xpce/src/ker/passing.c:341
#1 0x00007fffd5324205 in pceResolveImplementation (g=0x7ffffffeff90)
at /home/eshel/checkouts/swipl-devel/packages/xpce/src/ker/passing.c:450
#2 0x00007fffd532661c in vm_send
(receiver=0x0, selector=0x7fffd54b71c0 <builtin_names+57760>, class=0x0, argc=1, argv=0x7fffffff00c8)
at /home/eshel/checkouts/swipl-devel/packages/xpce/src/ker/passing.c:1052
#3 0x00007fffd530f2ed in pceSend
(receiver=0x0, classname=0x0, selector=0x7fffd54b71c0 <builtin_names+57760>, argc=1, argv=0x7fffffff00c8)
at /home/eshel/checkouts/swipl-devel/packages/xpce/src/itf/interface.c:588
#4 0x00007fffd54113df in pl_pce_init (Home=859, AppDir=860)
at /home/eshel/checkouts/swipl-devel/packages/xpce/swipl/interface.c:3361
#5 0x00007fffd5a4fd1e in PL_next_solution___LD (__PL_ld=0x7fffd5ca90e0 <PL_local_data>, qid=0x555556c9f910)
at /home/eshel/checkouts/swipl-devel/src/pl-vmi.c:4670
#6 0x00007fffd5ae3dcd in callProlog (module=0x55555790c4e0, goal=716, flags=4, ex=0x0)
at /home/eshel/checkouts/swipl-devel/src/pl-pro.c:462
#7 0x00007fffd5b2c16b in loadStatement___LD
(__PL_ld=0x7fffd5ca90e0 <PL_local_data>, state=0x7fffffff4af0, c=68, skip=0)
at /home/eshel/checkouts/swipl-devel/src/pl-wic.c:1112
#8 0x00007fffd5b31235 in loadPart___LD
(__PL_ld=0x7fffd5ca90e0 <PL_local_data>, state=0x7fffffff4af0, module=0x0, skip=0)
at /home/eshel/checkouts/swipl-devel/src/pl-wic.c:1954
#9 0x00007fffd5b2c22d in loadStatement___LD
(__PL_ld=0x7fffd5ca90e0 <PL_local_data>, state=0x7fffffff4af0, c=81, skip=0)
at /home/eshel/checkouts/swipl-devel/src/pl-wic.c:1135
#10 0x00007fffd5b31235 in loadPart___LD
(__PL_ld=0x7fffd5ca90e0 <PL_local_data>, state=0x7fffffff4af0, module=0x7fffffff4b78, skip=0)
at /home/eshel/checkouts/swipl-devel/src/pl-wic.c:1954
#11 0x00007fffd5b37e34 in qlfLoad___LD
(__PL_ld=0x7fffd5ca90e0 <PL_local_data>, state=0x7fffffff4af0, module=0x7fffffff4b78)
at /home/eshel/checkouts/swipl-devel/src/pl-wic.c:3580
#12 0x00007fffd5b38cd3 in pl_qlf_load2_va (PL__t0=703, PL__ac=2, PL__ctx=0x7fffffff4bf0)
at /home/eshel/checkouts/swipl-devel/src/pl-wic.c:3874
#13 0x00007fffd5a4fbe5 in PL_next_solution___LD (__PL_ld=0x7fffd5ca90e0 <PL_local_data>, qid=0x555557d21870)
at /home/eshel/checkouts/swipl-devel/src/pl-vmi.c:4641
#14 0x00007fffd5ae3dcd in callProlog (module=0x555556f2dd00, goal=487, flags=16, ex=0x0)
at /home/eshel/checkouts/swipl-devel/src/pl-pro.c:462
#15 0x00007fffd5ae3a54 in pl_sig_atomic1_va (PL__t0=487, PL__ac=1, PL__ctx=0x7fffffff81f0)
at /home/eshel/checkouts/swipl-devel/src/pl-pro.c:360
#16 0x00007fffd5a4fbe5 in PL_next_solution___LD (__PL_ld=0x7fffd5ca90e0 <PL_local_data>, qid=0x555557d84020)
at /home/eshel/checkouts/swipl-devel/src/pl-vmi.c:4641
#17 0x00007fffd5a3997d in PL_next_solution (qid=0x555557d84020)
at /home/eshel/checkouts/swipl-devel/src/pl-wam.c:3174
#18 0x00007fffd5d4e4d4 in sweep_next_solution (env=0x7fffffffb790, nargs=0, args=0x0, data=0x0)
at /home/eshel/checkouts/swipl-devel/packages/sweep/sweep.c:417
This surely indicates something that is pretty much unrelated to X11 itself is bothering the system.
I'm still not totally sure what to make of it... trying to inject
xpce2pl.so
withLD_PRELOAD
didn't cause a crash right away, neither could I get it to crash by playing around with Xwidgets.
Are you saying all works fine if you inject this file at the start, i.e., Xwidgets and sweep? If that is the case, it is likely that xpce uses a symbol from Xwidgets that causes it is misbehave and what you tried with Xwidgets isn't fatally affected by it now (probably) using this symbol from xpce. Could be that your test didn't use that symbol or somehow it doesn't matter too much.
If that is the case, the next step is to figure out what symbol may be causing this. I think that is not so hard. Use ldd on emacs to figure out the extra .so file(s) due to including Xwidgets are. Run nm -D
on them to get their dynamic loading symbol table. Do the same for pl2xpce.so
and see whether there are overlapping symbols. This gets the exports for me:
nm -D packages/xpce/pl2xpce.so | grep -vw U | awk '{print $3}'
That said, it also seems xpce doesn't get the linking flags set properly. It should, like e.g. libswipl.so
, only have the public API functions in the dynamic symbol table. Similar problems seem to be the case for other plugins that consist of multiple C modules.
We could also use RTLD_DEEPBIND
to the dlopen() call. See man dlopen
I see there are some more new options to dlopen() that might be worth supporting :smile: One is RTLD_NOLOAD | RTLD_GLOBAL
that would allow Prolog to make its own symbols global, so it can load plugins even if it was originally loaded with RTLD_LOCAL
. That could also fix our issues with older Emacs versions :smile:
Are you saying all works fine if you inject this file at the start, i.e., Xwidgets and sweep?
No, after injecting pl2xpce.so
Xwidgets works fine, but if I also try to load and use Sweep, then Emacs crashes just as it did before.
If that is the case, the next step is to figure out what symbol may be causing this. I think that is not so hard.
I'll give it a try :)
No, after injecting
pl2xpce.so
Xwidgets works fine, but if I also try to use load and use Sweep, then Emacs crashes just as it did before.
That suggests I'm barking against the wrong tree. Still, I see few options other options for the interaction. You could also preload the Xwidgets .so into Prolog without sweep and see whether that causes xpce to misbehave?
Pushed 423863948d3ff622b5079bf68bd35c41ded24601 which allows for
:- use_foreign_library(..., [deepbind(true)]).
That allows for some more experiments. Might not be something to use often, but it can resolve special cases :smile:
When Emacs is built with --with-xwidgets
, it links to libwebkit2gtk-4.1.so
. All of the symbols that this library export are prefixed with webkit_
, so no overlap with symbols from pl2xpce.so
. Also, injecting that .so
to swipl
with LD_PRELOAD
doesn't cause any problems for starting XPCE.
But --with-xwidgets
has further effects when building Emacs, there's an extra object file xwidget.o
that Emacs links (statically) and there are a bunch of #ifdef HAVE_XWIDGETS
directives throughout the source code, so the interaction might be more complicated.
Also, adding the new deepbind(true)
option to the use_foreign_library
call in init_pce
does avoid this crash!
Also, adding the new
deepbind(true)
option to theuse_foreign_library
call ininit_pce
does avoid this crash!
Great. So, it is a symbol issue, but apparently not with libwebkit2gtk-4.1.so
. I'll need to think a bit about this. I think the best solution is to fix the link options of the packages. I don't know what the implication of using the deepbind option are.
Thanks for your time. You may get another testing request, but probably not before Friday.
Anyway, I'm happy to report that removing --with-xwidgets
from the compile options stopped the crash in our environment as well.
Thanks for investigating and providing the workaround.
Pushed some patches that hides the symbols of all plugins except for the module install and uninstall hooks. This should avoid almost any name conflict. The only shared symbols are now the public Prolog API (PL and S) and the plugin `installand
uninstall_
And, for @eshelyaron, the sweep symbols to claim GPL compliance and the emacs module init :smile:
Please check and close when done.
Pushed some patches that hides the symbols of all plugins except for the module install and uninstall hooks. This should avoid almost any name conflict
That seems to solve this XPCE/Xwidgets collision, thank you Jan!
Great. Thanks for testing.
I also checked the operation. No more crashes. Thanks for the fix!
Overview
Emacs crashes with segv when opening
library/clp/bounds.pl
in Emacs with sweep enabled.Reproduction procedure and error
sweep config
All my init.el are available on GitHub.
https://github.com/ncaq/.emacs.d/blob/master/init.el
error message
Environment
OS and SWI-Prolog
Emacs(make
report-emacs-bug
)