Open arlyon opened 1 year ago
Here's what Pants is trying to do https://github.com/pantsbuild/pants/issues/7369
Ah, I see it is mentioned in the pants issue
Hi all, we are releasing a hermetic toolchain based on indygreg/python-build-standalone this week. (also hi @LegNeato, I submitted a few patches for juniper a couple of years back :) )
I'm currently looking for a way to bundle a static python build with a rust program. The closest I've found is https://github.com/python-cmake-buildsystem/python-cmake-buildsystem (although that goes only up to 3.6) which replaces the configure logic of cpython with cmake scripts, the same would presumably be needed for buck2 as well so that one could have a rust/cpp program build by buck2 and then statically link against cpython. I've found that this conceptually easy task is really not that well supported yet anywhere (see https://pyo3.rs/v0.14.0/building_and_distribution#statically-embedding-the-python-interpreter as well).
This is helpful if one needs/ wants python, but does not rely on much of its ecosystem/ packages and just wants a scripting language which is known by many developers for some parts of a system.
(python-build-standalone
seems to be a wrapper around the configure scripts of the core cpython distribution + relies on docker which is a step I'd like to avoid)
This is a little bit hacky, but I was asked to share how I fixed this problem:
(please ignore the fact this is python3.8 😓 )
BUCK
http_archive(
name = "python-standalone-archive",
# TODO self host this
urls = [ "https://github.com/indygreg/python-build-standalone/releases/download/20231002/cpython-3.8.18+20231002-x86_64-unknown-linux-gnu-pgo-full.tar.zst"],
sha256 = "3209542fbcaf7c3ef5658b344ea357c4aabf5fe7cbf1b5dea4a0b78b64835fc0",
visibility = ["PUBLIC"],
)
standalone_python(
name = "python-standalone",
archive = ":python-standalone-archive",
visibility = ["PUBLIC"]
)
prebuilt_cxx_library(
name = "python-headers",
header_dirs = [ "@toolchains//python:python-standalone[includes]"],
visibility = ["PUBLIC"],
)
defs.bzl
def _standalone_python_impl(ctx: AnalysisContext) -> list[Provider]:
# generate a runnable python3 binary
python = ctx.actions.declare_output("__python", dir = True)
ctx.actions.copy_dir(python, ctx.attrs.archive)
interpreter = cmd_args(python, format = "{}/python/install/bin/python3").hidden(python)
# provide relavant headers for pybind
includes = ctx.actions.declare_output("include", dir = True)
ctx.actions.copy_file(includes, python.project("python/install/include/python3.8"))
return [
DefaultInfo(sub_targets = {
"interpreter": [RunInfo(interpreter)],
"includes": [DefaultInfo(includes)],
})
]
standalone_python = rule(
impl = _standalone_python_impl,
attrs = {
"archive": attrs.source(),
}
)
toolchain//BUCK
system_python_toolchain(
name = "python",
interpreter = "toolchains//python:python-standalone[interpreter]",
visibility = ["PUBLIC"],
)
Hey @benbrittain, quick question about that snippet. As far as I can tell the interpreter attribute on system_python_toolchain
expects a string which represents the name of the python binary, e.g. python
or python3
. How did you get this to work providing a RunInfo reference to the interpreter attribute? Or is this more like pseudocode?
As it stands if I run this code I get output like this:
$ buckle build //:thing-that-uses-python
Local command returned non-zero exit code <no exit code>
Reproduce locally: `env -- 'BUCK_SCRATCH_PATH=buck-out/v2/tmp/root/213ed1b7ab869379/__backend-npm-install__/npm' '//tool ...<omitted>... buck-out/v2/gen/root/213ed1b7ab869379/__backend-npm-install__/node_modules --environment development (run `buck2 log what-failed` to get the full command)`
stdout:
stderr:
Spawning executable `//toolchains/python:python-standalone[interpreter]` failed: Failed to spawn a process
$ buckle log what-failed
Showing commands from: /Users/dan/Library/Caches/buckle/a1226c67e221a84c1562008739dcb322710e86e2/buck2 build //:thing-that-uses-python
build root//:thing-that-uses-python (prelude//platforms:default#213ed1b7ab869379) (npm) local env -- 'TMPDIR=/Users/dan/devel/backend2/buck-out/v2/tmp/root/213ed1b7ab869379/__backend-npm-install__/npm' 'BUCK_SCRATCH_PATH=buck-out/v2/tmp/root/213ed1b7ab869379/__backend-npm-install__/npm' 'BUCK2_DAEMON_UUID=89552eba-0149-445a-b0ee-f7d6ee0a0df3' 'BUCK_BUILD_ID=ad2238c7-8161-47d5-b3fb-598a98a18e23' '//toolchains/python:python-standalone[interpreter]' buck-out/v2/gen/prelude-replay/213ed1b7ab869379/npm/__npm_install.py__/npm_install.py --npm buck-out/v2/gen/toolchains/213ed1b7ab869379/__node-18.16.1__/node-18.16.1/bin/npm --package_json ./package.json --package_lock ./package-lock.json --output buck-out/v2/gen/root/213ed1b7ab869379/__backend-npm-install__/node_modules --environment development
Note how it's just sticking the literal string //toolchains/python:python-standalone[interpreter]
in the command.
I believe that you'll want to use something like "$(exe //toolchains/python:python-standalone[interpreter])"
instead, which gets replaced by the location of the artifact (the interpreter in that case). exe
also ensures that it has a RunInfo
, and location
is also available when that's not required.
@cbarrete yeah I tried that too but it doesn't look like that attribute supports the macros
Spawning executable `$(exe @prelude-replay//python:python-standalone[interpreter])` failed: Failed to spawn a process
Showing commands from: /Users/dan/Library/Caches/buckle/a1226c67e221a84c1562008739dcb322710e86e2/buck2 build //:backend-npm-install
build root//:backend-npm-install (prelude//platforms:default#213ed1b7ab869379) (npm) local env -- 'TMPDIR=/Users/dan/devel/backend2/buck-out/v2/tmp/root/213ed1b7ab869379/__backend-npm-install__/npm' 'BUCK_SCRATCH_PATH=buck-out/v2/tmp/root/213ed1b7ab869379/__backend-npm-install__/npm' 'BUCK2_DAEMON_UUID=89552eba-0149-445a-b0ee-f7d6ee0a0df3' 'BUCK_BUILD_ID=345f1930-17d7-4b88-b3da-be18602adcf4' '$(exe @prelude-replay//python:python-standalone[interpreter])' buck-out/v2/gen/prelude-replay/213ed1b7ab869379/npm/__npm_install.py__/npm_install.py --npm buck-out/v2/gen/toolchains/213ed1b7ab869379/__node-18.16.1__/node-18.16.1/bin/npm --package_json ./package.json --package_lock ./package-lock.json --output buck-out/v2/gen/root/213ed1b7ab869379/__backend-npm-install__/node_modules --environment development
I ironed out some issues and got a working hermetic Python toolchain (at least on mac x86, mac arm64 and linux x86, Windows is untested). I posted it here https://github.com/jazzdan/buck2-python-toolchain-problem
For serious use-cases we would benefit from a system-independent python toolchain that can source an interpreter. At the basic level this involves downloading a copy of CPython for the current platform that can run code. An issue with this is that basic cpython depends on dynamic libraries such as libssl and libsqlite, so we need to either provide a mechanism for building those consistently (c / cpp compiler) or use an interpreter with static linking such as https://python-build-standalone.readthedocs.io
Potential learnings from bazel