bazelbuild / rules_python

Bazel Python Rules
https://rules-python.readthedocs.io
Apache License 2.0
511 stars 518 forks source link

Downloaded python toolchain has shared library dependencies #1211

Open malt3 opened 1 year ago

malt3 commented 1 year ago

🐞 bug report

Affected Rule

n/a

Is this a regression?

no

Description

The python toolchains downloaded by this rule depend on an (outdated) version of the glibc. This does not work for a number of environments. Maybe a statically linked python binary or different versions of the toolchains could be provided instead. I'd be happy to help work on a more hermetic (as in: not relying on system-wide libraries) python toolchain.

🔬 Minimal Reproduction

Register any of the builtin python toolchain for download, find thy python3 binary in the cache and run:

ldd /home/malte/.cache/bazel/_bazel_malte/e58bd5cf140e0accfbab91a2f501a0a3/external/python3_10_x86_64-unknown-linux-gnu/bin/python3
        linux-vdso.so.1 (0x00007ffd062db000)
        /home/malte/.cache/bazel/_bazel_malte/e58bd5cf140e0accfbab91a2f501a0a3/external/python3_10_x86_64-unknown-linux-gnu/bin/../lib/libpython3.10.so.1.0 (0x00007fcef8400000)
        libcrypt.so.1 => /lib/x86_64-linux-gnu/libcrypt.so.1 (0x00007fcef996a000)
        libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007fcef9965000)
        libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007fcef9960000)
        libutil.so.1 => /lib/x86_64-linux-gnu/libutil.so.1 (0x00007fcef995b000)
        libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007fcef9872000)
        librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007fcef986d000)
        libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fcef8000000)
        /lib64/ld-linux-x86-64.so.2 (0x00007fcef99c5000)

As you can see, the library has versioned glibc dependencies. This is problematic in environments with newer glibcs (arch linux, fedora) or environments that use different libcs (alpine) or don't always provide a "default" glibc for tools (nix / nixos).

🔥 Exception or Error

When using Fedora as a host system, the build fails like this:

❯ bazel run @python3_10//:python3
INFO: Invocation ID: 8c2857a4-fd52-4757-9e09-9a8fa88e2440
INFO: Build option --//bazel/settings:tpm_simulator has changed, discarding analysis cache.
INFO: Analyzed target @python3_10//:python3 (1 packages loaded, 17 targets configured).
INFO: Found 1 target...
Target @python3_10_x86_64-unknown-linux-gnu//:bin/python3 up-to-date (nothing to build)
INFO: Elapsed time: 1.148s, Critical Path: 0.12s
INFO: 1 process: 1 internal.
INFO: Build completed successfully, 1 total action
INFO: Running command line: /home/builder/.cache/bazel/_bazel_builder/eab0d61a99b6696edb3d2aff87b585e8/external/python3_10_x86_64-unknown-linux-gnu/bin/python3
/home/builder/.cache/bazel/_bazel_builder/eab0d61a99b6696edb3d2aff87b585e8/external/python3_10_x86_64-unknown-linux-gnu/bin/python3: error while loading shared libraries: libcrypt.so.1: cannot open shared object file: No such file or directory

The OS provides /usr/lib64/libcrypt.so.2, but not libcrypt.so.1.

🌍 Your Environment

Operating System(s):

Output of bazel version:

Build label: 6.1.2
Build target: bazel-out/k8-opt/bin/src/main/java/com/google/devtools/build/lib/bazel/BazelServer_deploy.jar
Build time: Tue Apr 18 15:29:54 2023 (1681831794)
Build timestamp: 1681831794
Build timestamp as int: 1681831794

Rules_python version:

0.21.0

Anything else relevant?

This issue exists: #716, but the bug was not actually fixed here. Forcing downstream users to install old glibc versions system wide seems like it will become harder over time.

chrislovecnm commented 1 year ago

Forcing downstream users to install old glibc versions system-wide seems like it will become harder over time.

What version of pre-packaged Python do you recommend that runs on multiple platforms? We are using https://github.com/indygreg/python-build-standalone/, and they also would need to address this issue.

You can also define base_url in the python_register_toolchains to a different downloadable Python bundle URL. The package's naming needs to map to the naming standard used in https://github.com/bazelbuild/rules_python/blob/main/python/versions.bzl.

The other option is to use a system interpreter instead of a hermetic interpreter.

malt3 commented 1 year ago

Looking at python-build-standalone, I would like to select x86_64-unknown-linux-musl instead of x86_64-unknown-linux-gnu (see also), since it ships everything statically linked. In this case, the base url would stay the same. Is this currently possible or would i need to tweak the bazel ruleset?

chrislovecnm commented 1 year ago

So first off I agree with you that we should change the rules to use a statically linked python. Secondly you probably can use https://blog.aspect.dev/configuring-bazels-downloader to rewrite the downloaded package.

malt3 commented 1 year ago

Just since the static version is unable to load .so libraries at runtime, it may still be required to let users choose. That being said, I have no idea if that's a widely used feature or more of a niche thing..

chrislovecnm commented 1 year ago

At this point the rules decide for the user which binary to use, and we also test against those binaries. If you feel that users should have the choice, then at this point users can decide through configuring the downloadeder. I think you can override the name of the binary and even download it from GitHub.

Why would a user want to use a binary that is linked the older libraries? We have a Python special interest group, and if you would want to join us it is a great topic for us to discuss.

malt3 commented 1 year ago

I'm not super deep in the inner workings of python libraries, but if I understand it correctly, some python libraries ship precompiled shared libraries that the python interpreter will load dynamically. The musl version of the python toolchain cannot do that, so users might run into unexpected problems if this is the default. For me personally, that is not an issue. Happy to explain in a SIG meeting but also not the best person to give a detailed rundown of this problem.

rickeylev commented 1 year ago

This level of C and platform stuff is a bit outside my expertise, so I asked a couple coworkers who know more.

re: using musl: I'm told musl is a different implementation of libc, so there are potential ABI issues with something on PyPI (which, iiuc, assumes glibc as part of a wheel's "manylinux" definition). This means we can't use musl as the default.

That said, I think it should be easier to use a musl-based interpreter than patching and overriding URLs. Is this something we can auto detect and use? The info I found says that certain platforms use musl. Can, should, the toolchain logic detect if you're running e.g. fedora or alpine, and use musl if so?

re: using static linking: Maybe? I think this depends on what is being statically linked into the interpreter and what prebuilt things on PyPI expect. I'm also told that you generally don't want to statically link certain libraries (e.g. libc) because those are how you interact with the host system APIs. Again, this is a bit outside my expertise. If standalone python doesn't provide a glibc-based statically linked interpreter build, then this is a bit moot.

aignas commented 1 year ago

FYI, some PyPI packages have musl wheels being distributed next to the manylinux wheels which are glibc. Example: https://pypi.org/project/coverage/#files

We should probably add a libc constraint somewhere to make sure that we can register both, musl and glibc python interpreters and then let the toolchain resolution do what needs to be done, but it seems that the discussion for having this in @platforms has not been resolved yet: https://github.com/bazelbuild/platforms/issues/38.

malt3 commented 1 year ago

Yeah the musl platform and in theory also auto detection would work in cases where you have a dynamically linked musl libc and are on alpine or similar. My use case is that I want to be independent of what's provided/preinstalled on the host (hence the statically linked interpreter). I would like to have some kind of selector for the kind of toolchain when registering it (choice between dynamically linked with glibc, statically linked with musl, maybe more in the future).

I'm also told that you generally don't want to statically link certain libraries (e.g. libc) because those are how you interact with the host system APIs

This is definitely true for glibc. With musl this is not a problem if the kernel is recent enough.

chrislovecnm commented 11 months ago

Where is this issue at?

malt3 commented 11 months ago

Nobody has attempted to implement this yet. I don't have the capacity work on it at the moment. If anybody wants to try, I'd be super happy to see this moving forward. Maybe I can work on this some time in the future.

github-actions[bot] commented 5 months ago

This issue has been automatically marked as stale because it has not had any activity for 180 days. It will be closed if no further activity occurs in 30 days. Collaborators can add an assignee to keep this open indefinitely. Thanks for your contributions to rules_python!

rickeylev commented 5 months ago

From some discussions on slack and other issues, I think there's two ways forward on this:

Long term, get these alternative runtimes part of what rules_python inherently understands. This just requires (a) adding all the URLs and (b) having those runtimes registered as toolchains with the appropriate constraints. Then one can do e.g. bazel bulid --config=musl ... and then the musl-based python toolchain will be selected automatically and Just Work.

A shorter term alternative is do to do (a) above, and then add a python.root_overrides() function to allow the root module to override some of the default behaviors (there's 2 other use cases that could use such a function).

hzeller commented 4 months ago

Is it somewhere clearly documented how to escape this, and tell the rules_python to just use the Python installed on the system if need be ?

aignas commented 4 weeks ago

With #1837 I have added py_linux_libc flag which I needed to handle musllinux wheels in select statements and this lead me to add flag_values to our toolchain platform definitions. With that we should be ready to add musllinux toolchains from indygreg and expect everything to work.

As for the local system interpreter, it will be used by the autodetecting toolchain if no hermetic toolchain is registered. Maybe this could be facilitated in a more easy way if we had a string_flag or a bool_flag to switch the hermetic rules_python toolchain on and off. Since we already have the flag_values as part of our PLATFORMS definitions in python/versions.bzl that should be a trivial change.

aignas commented 3 weeks ago

OK, since #1837 has landed and the bugs with toolchain selection got squashed, adding extra musl changes should be now indeed very easy.

I personally don't have time to do this right now, so if anyone comes along and does it, add me as a reviewer. :)