Open green-nsk opened 5 months ago
While I haven't looked too deep into this, I imagine it has something to do with RTLD_DEEPBIND
, which we use in many places. @staticfloat might have a better idea.
@staticfloat or anyone else, is there more insight on what's happening?
At the very least I'd like to understand is it a bug/regression or is it by design?
Bisected to 82c89c680e8326da92412a082073e5e4044fd14f from https://github.com/JuliaLang/julia/pull/50162; cc @topolarity.
So the reason this happens is that we lookup the symbols first in the libraries before we look them up in the main executable. That kind of behaves like RTLD_DEEPBIND, which has the side effect of making interposing symbols like this not work. The reason for that is to allow multiple julias to be loaded in the same process. I do wonder if we could try to restore the interposer behaviour a bit by looking non julia symbols in the executable and julia symbols in the library.
It's worth mentioning that (as @gbaraldi points out), the new behavior is similar to the (pre-existing) behavior from the C side which is due to RTLD_DEEPBIND
:
$ cat mylib.c
int socket(int domain, int type, int protocol);
__attribute__((visibility("default")))
int socket2(int domain, int type, int protocol) {
return socket(domain, type, protocol);
}
$ gcc -shared -o socket.so -fPIC socket.c
$ gcc -shared -o mylib.so -fPIC mylib.c
$ LD_PRELOAD=./socket.so julia-1.10 -e 'println(ccall((:socket2, "./mylib.so"), Cint, (Cint, Cint, Cint), 0, 0, 0))'
-1
I didn't think about LD_PRELOAD
specifically when we made this change, but I think the need for an explicit opt-in to the interposition via ccall((:symbol, ""), ...)
is probably a good thing despite the unintended breakage.
This might break system profilers that use LD_PRELOAD?
As an example in MPI.jl we use the pattern of ccall(:symbol)
so that MPI profilers can hock these symbols explicitly
https://github.com/JuliaParallel/MPI.jl/pull/451 / https://github.com/JuliaParallel/MPI.jl/pull/450
We can of course change MPI to use the (:call, "")
syntax...
We already have a workaround for our case, but I can't see why inconsistent behaviour is acceptable. Also, I am worried there may be other unintended inconsistencies:
ccall()
resolution. In my particular case, LD_PRELOAD=socket.so
affects calls to socket()
from inside Sockets.jl/libuv, but not from ccall(:socket)
. This is probably the worst one.dlsym(dlopen(""), :foo)
also picks up a different function from ccall(:foo)
If all of that is not a concern, at the very least there should be a mention of those different behaviours in call documentation.
julia internal symbol resolution is inconsistent with ccall() resolution. In my particular case, LD_PRELOAD=socket.so affects calls to socket() from inside Sockets.jl/libuv, but not from ccall(:socket). This is probably the worst one.
I agree that this is important.
The problem is that it's already inconsistent in 1.9:
$ cat socket.c
#include "stdio.h"
__attribute__((visibility("default")))
int socket(int domain, int type, int protocol) {
fprintf(stderr, "Called LD_PRELOAD socket() hook\n");
return 42;
}
$ LD_PRELOAD=./socket.so julia-1.9 -e 'using Sockets; TCPSocket(; delay = false)'
Called LD_PRELOAD socket() hook
$ LD_PRELOAD=./socket.so julia-1.9 -e 'using MySQL; DBInterface.connect(MySQL.Connection, "localhost", "user", "passwd")'
<no hook message>
Both of these use socket()
internally, but only one picks up the LD_PRELOAD overload.
What's the deal? The difference here is that MySQL.jl is using libmariadbclient
, which is loaded via JLL/ccall with RTLD_DEEPBIND
, meaning that its symbol resolution is not affected by LD_PRELOAD.
In contrast, any of the "built-in" libraries (defined via DEP_LIBS
here) are loaded without RTLD_DEEPBIND, which is why libuv does pick up the LD_PRELOAD
I do wonder if we could try to restore the interposer behaviour a bit by looking non julia symbols in the executable and julia symbols in the library.
We could check to see which library the symbol actually resolved to, and if it's not one of the libjulia-*
we could repeat the look-up with the old behavior?
We could check to see which library the symbol actually resolved to, and if it's not one of the
libjulia-*
we could repeat the look-up with the old behavior?
Wouldn't it mean that different Julia processes potentially loading different versions of JLL's will step onto each other's toes?
we're using a custom network stack library that loads via LD_PRELOAD mechanism and overrides certain function calls (
socket()
,recv()
,setsockopt()
and others). For some reason starting julia-1.10ccall(:socket)
doesn't pick up LD_PRELOAD version of the call anymore.We found a workaround to call
ccall((:socket, ""))
works with our LD_PRELOAD network stack. We'd like to understand what's changed and how the twoccall()
versions are different and how we can make sure we don't hit that in the future versions.Repro:
Julia downloaded from https://julialang-s3.julialang.org/bin/linux/x64/1.10/julia-1.10.2-linux-x86_64.tar.gz