Open malt3 opened 1 year ago
Forcing downstream users to install old glibc versions system-wide seems like it will become harder over time.
What version of pre-packaged Python do you recommend that runs on multiple platforms? We are using https://github.com/indygreg/python-build-standalone/, and they also would need to address this issue.
You can also define base_url
in the python_register_toolchains
to a different downloadable Python bundle URL. The package's naming needs to map to the naming standard used in https://github.com/bazelbuild/rules_python/blob/main/python/versions.bzl.
The other option is to use a system interpreter instead of a hermetic interpreter.
Looking at python-build-standalone, I would like to select x86_64-unknown-linux-musl
instead of x86_64-unknown-linux-gnu
(see also), since it ships everything statically linked.
In this case, the base url would stay the same.
Is this currently possible or would i need to tweak the bazel ruleset?
So first off I agree with you that we should change the rules to use a statically linked python. Secondly you probably can use https://blog.aspect.dev/configuring-bazels-downloader to rewrite the downloaded package.
Just since the static version is unable to load .so libraries at runtime, it may still be required to let users choose. That being said, I have no idea if that's a widely used feature or more of a niche thing..
At this point the rules decide for the user which binary to use, and we also test against those binaries. If you feel that users should have the choice, then at this point users can decide through configuring the downloadeder. I think you can override the name of the binary and even download it from GitHub.
Why would a user want to use a binary that is linked the older libraries? We have a Python special interest group, and if you would want to join us it is a great topic for us to discuss.
I'm not super deep in the inner workings of python libraries, but if I understand it correctly, some python libraries ship precompiled shared libraries that the python interpreter will load dynamically. The musl version of the python toolchain cannot do that, so users might run into unexpected problems if this is the default. For me personally, that is not an issue. Happy to explain in a SIG meeting but also not the best person to give a detailed rundown of this problem.
This level of C and platform stuff is a bit outside my expertise, so I asked a couple coworkers who know more.
re: using musl: I'm told musl is a different implementation of libc, so there are potential ABI issues with something on PyPI (which, iiuc, assumes glibc as part of a wheel's "manylinux" definition). This means we can't use musl as the default.
That said, I think it should be easier to use a musl-based interpreter than patching and overriding URLs. Is this something we can auto detect and use? The info I found says that certain platforms use musl. Can, should, the toolchain logic detect if you're running e.g. fedora or alpine, and use musl if so?
re: using static linking: Maybe? I think this depends on what is being statically linked into the interpreter and what prebuilt things on PyPI expect. I'm also told that you generally don't want to statically link certain libraries (e.g. libc) because those are how you interact with the host system APIs. Again, this is a bit outside my expertise. If standalone python doesn't provide a glibc-based statically linked interpreter build, then this is a bit moot.
FYI, some PyPI packages have musl
wheels being distributed next to the manylinux
wheels which are glibc
. Example: https://pypi.org/project/coverage/#files
We should probably add a libc
constraint somewhere to make sure that we can register both, musl
and glibc
python interpreters and then let the toolchain resolution do what needs to be done, but it seems that the discussion for having this in @platforms
has not been resolved yet: https://github.com/bazelbuild/platforms/issues/38.
Yeah the musl platform and in theory also auto detection would work in cases where you have a dynamically linked musl libc and are on alpine or similar. My use case is that I want to be independent of what's provided/preinstalled on the host (hence the statically linked interpreter). I would like to have some kind of selector for the kind of toolchain when registering it (choice between dynamically linked with glibc, statically linked with musl, maybe more in the future).
I'm also told that you generally don't want to statically link certain libraries (e.g. libc) because those are how you interact with the host system APIs
This is definitely true for glibc. With musl this is not a problem if the kernel is recent enough.
Where is this issue at?
Nobody has attempted to implement this yet. I don't have the capacity work on it at the moment. If anybody wants to try, I'd be super happy to see this moving forward. Maybe I can work on this some time in the future.
This issue has been automatically marked as stale because it has not had any activity for 180 days. It will be closed if no further activity occurs in 30 days. Collaborators can add an assignee to keep this open indefinitely. Thanks for your contributions to rules_python!
From some discussions on slack and other issues, I think there's two ways forward on this:
Long term, get these alternative runtimes part of what rules_python inherently understands. This just requires (a) adding all the URLs and (b) having those runtimes registered as toolchains with the appropriate constraints. Then one can do e.g. bazel bulid --config=musl ...
and then the musl-based python toolchain will be selected automatically and Just Work.
A shorter term alternative is do to do (a) above, and then add a python.root_overrides()
function to allow the root module to override some of the default behaviors (there's 2 other use cases that could use such a function).
Is it somewhere clearly documented how to escape this, and tell the rules_python
to just use the Python installed on the system if need be ?
With #1837 I have added py_linux_libc
flag which I needed to handle musllinux
wheels in select statements and this lead me to add flag_values
to our toolchain platform definitions. With that we should be ready to add musllinux
toolchains from indygreg and expect everything to work.
As for the local system interpreter, it will be used by the autodetecting toolchain if no hermetic toolchain is registered. Maybe this could be facilitated in a more easy way if we had a string_flag
or a bool_flag
to switch the hermetic rules_python toolchain on and off. Since we already have the flag_values
as part of our PLATFORMS
definitions in python/versions.bzl
that should be a trivial change.
OK, since #1837 has landed and the bugs with toolchain selection got squashed, adding extra musl
changes should be now indeed very easy.
I personally don't have time to do this right now, so if anyone comes along and does it, add me as a reviewer. :)
🐞 bug report
Affected Rule
n/a
Is this a regression?
no
Description
The python toolchains downloaded by this rule depend on an (outdated) version of the glibc. This does not work for a number of environments. Maybe a statically linked python binary or different versions of the toolchains could be provided instead. I'd be happy to help work on a more hermetic (as in: not relying on system-wide libraries) python toolchain.
🔬 Minimal Reproduction
Register any of the builtin python toolchain for download, find thy
python3
binary in the cache and run:As you can see, the library has versioned glibc dependencies. This is problematic in environments with newer glibcs (arch linux, fedora) or environments that use different libcs (alpine) or don't always provide a "default" glibc for tools (nix / nixos).
🔥 Exception or Error
When using Fedora as a host system, the build fails like this:
The OS provides
/usr/lib64/libcrypt.so.2
, but notlibcrypt.so.1
.🌍 Your Environment
Operating System(s):
Output of
bazel version
:Rules_python version:
Anything else relevant?
This issue exists: #716, but the bug was not actually fixed here. Forcing downstream users to install old glibc versions system wide seems like it will become harder over time.