JuliaPy / PythonCall.jl

Python and Julia in harmony.
https://juliapy.github.io/PythonCall.jl/stable/
MIT License
806 stars 64 forks source link

Fixing incorrect "Unable to load"/"GLIBCXX not found" issue, once and for all (hopefully) #437

Open MilesCranmer opened 10 months ago

MilesCranmer commented 10 months ago

Affects: Both

Describe the bug This is a bug that has plagued PyCall/PyJulia for a while and it seems the same issue occurs with PythonCall/juliacall. I've been discussing potential solutions with @mkitti for a while and am curious to hear what others think, in particular @cjdoris.

Basically, depending on your particular environment, you might see the following:

In [1]: import sklearn  # Triggers load of incompatible libstdc++

In [2]: from juliacall import Main as jl
ERROR: Unable to load dependent library /home/mc2473/.julia/juliaup/julia-1.10.0+0.x64.linux.gnu/lib/julia/libjulia-codegen.so.1.10
Message:/lib64/libstdc++.so.6: version `GLIBCXX_3.4.26' not found (required by /home/mc2473/.julia/juliaup/julia-1.10.0+0.x64.linux.gnu/lib/julia/libjulia-codegen.so.1.10)

which crashes Python without providing any useful debug information. This issue is related to https://github.com/JuliaPy/PythonCall.jl/issues/255 which has been addressed in the documentation (as well as the PyJulia docs here).

In my opinion this is a really sharp corner of Python<->Julia interfaces, making them significantly less practical for end-users. I would really like to find a way to automatically solve this.

The simplest way to fix it is to preload the correct libstdc++ when starting Python, for example:

LD_PRELOAD=$HOME/.julia/juliaup/julia-1.10.0+0.x64.linux.gnu/lib/julia/libstdc++.so.6 ipython

Which is enough to solve the issue entirely. However this is not effective generally as you have to define it before python even starts.

The other solution is to import julia as early as possible. However, again, this is not a general solution as the julia import might occur deep in some dependency that an inexperienced end-user is simply not aware of.

So I'm wondering what options we have to actually fix this, once and for all (hopefully), so that an end-user won't have to run into this ever again. Maybe:

  1. Can we replace the loaded libstdc++ at runtime to the "correct" one as installed by julia?
  2. Can we load two versions of libstdc++ into Python at once? (Is that possible?)
  3. Can we install a static version of the relevant Julia libraries?
  4. Can we check, in advance, whether there will be a GLIBCXX issue, and prevent the Python hard crash – maybe using the opportunity to directly provide debugging information to the user?

What do you think?

mkitti commented 10 months ago

Isn't the solution to use conda-forge here to install scikit-learn and Python? This will then use the libstdc++.so in the $CONDA_PREFIX which should hopefully be more recent? Trying to use the system python is not going to work here.

MilesCranmer commented 10 months ago

Yes using conda-forge will get around this issue.

However sometimes it isn’t an option; for example, some cluster sysadmin strongly discourage conda as it can bloat home file system with 100,000+ files for every user (slowing down backups), whereas a system python with virtualenv is able to reuse common libraries for different users across the cluster.

This doesn’t personally affect me but I know some PySR end users who need to use a system-wide python and libraries. I think those types are also the users who are most likely to get hit by the libstdc++ issue.

But maybe you think there’s no other way around this, and just giving better debugging info is the only option?

mkitti commented 10 months ago

Either the system admin needs to 1) Make a newer libstdc++ available and rebuild Python / sklearn against it 2) Figure out how to build Julia with the older libstdc++

The only other thing I can think of other than conda is

julia> using Pkg

julia> pkg"add Python_jll"

julia> using Python_jll

julia> run(Python_jll.python())
MilesCranmer commented 10 months ago

Just to emphasize: these aren't issues I deal with myself, it's just a barrier to entry for end-users who occasionally have trouble with them. There is always multiple client-side ways around these problems (including asking sysadmin for help) but I think to make PythonCall/PyCall as widely useful in the Python ecosystem as, say, numpy is (which I think given time and the right development, Julia as a backend totally could be!), we need to push a bit further on our side to automatically patch these sharp edges (somehow...).

mkitti commented 10 months ago

Automatically manipulating LD_PRELOAD and LD_LIBRARY_PATH for the user for a 3rd party binary is likely to cause more issues than we are solving.

https://www.hpc.dtu.dk/?page_id=1180 http://xahlee.info/UnixResource_dir/_/ldpath.html#google_vignette

HPC systems typically have some sort of mechanism to address these issues. For example, the modules system will usually handle this comprehensively. In fact, if we change the above environment variables, we will likely interfere with the modules system as it also uses these variables to configure software:

https://hpc-wiki.info/hpc/Modules

Specifically, the symbol GLIBCXX_3.4.26 is associated with GCC 9.1.0

https://gcc.gnu.org/onlinedocs/libstdc++/manual/abi.html

gcc 9.1.0 was released on May 2019: https://gcc.gnu.org/releases.html

For example, on a cluster such as CSD3, I would investigate the output of the following.

module avail 2>&1 | grep -i python

I based this on https://docs.hpc.cam.ac.uk/hpc/user-guide/modules.html

Almost every single HPC I have seen uses this exact mechanism to load newer libraries.

mkitti commented 10 months ago

This documentation could also be useful. You could technically modify the executable's RUNPATH. That's a little better than hacking LD_LIBRARY_PATH

https://amir.rachum.com/shared-libraries/#runtime-search-path

cjdoris commented 9 months ago

This has largely been fixed in PythonCall (i.e. calling Python from Julia) by having CondaPkg install a version of libstdc++ compatible with whatever Julia is using.

It would be nice to do a similar thing the other way around, but for that we'd need an interface to know which version of libstdc++ the Python environment will use. Does such an interface exist?