facebook / buck2

Build system, successor to Buck
Apache License 2.0
3.37k stars 199 forks source link

Providing a hermetic python toolchain #19

Open arlyon opened 1 year ago

arlyon commented 1 year ago

For serious use-cases we would benefit from a system-independent python toolchain that can source an interpreter. At the basic level this involves downloading a copy of CPython for the current platform that can run code. An issue with this is that basic cpython depends on dynamic libraries such as libssl and libsqlite, so we need to either provide a mechanism for building those consistently (c / cpp compiler) or use an interpreter with static linking such as https://python-build-standalone.readthedocs.io

Potential learnings from bazel

danmx commented 1 year ago

Here's what Pants is trying to do https://github.com/pantsbuild/pants/issues/7369

LegNeato commented 1 year ago

You might want to look into https://github.com/indygreg/PyOxidizer and https://gregoryszorc.com/blog/2022/05/10/announcing-the-pyoxy-python-runner/ and https://github.com/indygreg/python-build-standalone/

LegNeato commented 1 year ago

Ah, I see it is mentioned in the pants issue

arlyon commented 1 year ago

Hi all, we are releasing a hermetic toolchain based on indygreg/python-build-standalone this week. (also hi @LegNeato, I submitted a few patches for juniper a couple of years back :) )

benmkw commented 1 year ago

I'm currently looking for a way to bundle a static python build with a rust program. The closest I've found is https://github.com/python-cmake-buildsystem/python-cmake-buildsystem (although that goes only up to 3.6) which replaces the configure logic of cpython with cmake scripts, the same would presumably be needed for buck2 as well so that one could have a rust/cpp program build by buck2 and then statically link against cpython. I've found that this conceptually easy task is really not that well supported yet anywhere (see https://pyo3.rs/v0.14.0/building_and_distribution#statically-embedding-the-python-interpreter as well).

This is helpful if one needs/ wants python, but does not rely on much of its ecosystem/ packages and just wants a scripting language which is known by many developers for some parts of a system.

(python-build-standalone seems to be a wrapper around the configure scripts of the core cpython distribution + relies on docker which is a step I'd like to avoid)

benbrittain commented 7 months ago

This is a little bit hacky, but I was asked to share how I fixed this problem:

(please ignore the fact this is python3.8 😓 )


    name = "python-standalone-archive",                                                                                  
    # TODO self host this                                                                                                
    urls = [ "https://github.com/indygreg/python-build-standalone/releases/download/20231002/cpython-3.8.18+20231002-x86_64-unknown-linux-gnu-pgo-full.tar.zst"],                                                                                 
    sha256 = "3209542fbcaf7c3ef5658b344ea357c4aabf5fe7cbf1b5dea4a0b78b64835fc0",                                         
    visibility = ["PUBLIC"],                                                                                             

    name = "python-standalone",                                                                                          
    archive = ":python-standalone-archive",                                                                              
    visibility = ["PUBLIC"]                                                                                              

    name = "python-headers",                                                                                             
    header_dirs = [ "@toolchains//python:python-standalone[includes]"],                                                  
    visibility = ["PUBLIC"],                                                                                             


def _standalone_python_impl(ctx: AnalysisContext) -> list[Provider]:                                                     
 # generate a runnable python3 binary                                                                                 
 python = ctx.actions.declare_output("__python", dir = True)                                                          
 ctx.actions.copy_dir(python, ctx.attrs.archive)                                                                      
 interpreter = cmd_args(python, format = "{}/python/install/bin/python3").hidden(python)                              

 # provide relavant headers for pybind                                                                                
 includes = ctx.actions.declare_output("include", dir = True)                                                         
 ctx.actions.copy_file(includes, python.project("python/install/include/python3.8"))                                  

 return [                                                                                                             
     DefaultInfo(sub_targets = {                                                                                      
         "interpreter": [RunInfo(interpreter)],                                                                       
         "includes": [DefaultInfo(includes)],                                                                         

standalone_python = rule(                                                                                                
 impl = _standalone_python_impl,                                                                                      
 attrs = {                                                                                                            
     "archive": attrs.source(),                                                                                       


    name = "python",                                                                                                     
    interpreter = "toolchains//python:python-standalone[interpreter]",                                                   
    visibility = ["PUBLIC"],                                                                                             
jazzdan commented 3 months ago

Hey @benbrittain, quick question about that snippet. As far as I can tell the interpreter attribute on system_python_toolchain expects a string which represents the name of the python binary, e.g. python or python3. How did you get this to work providing a RunInfo reference to the interpreter attribute? Or is this more like pseudocode?

As it stands if I run this code I get output like this:

$ buckle build //:thing-that-uses-python
Local command returned non-zero exit code <no exit code>
Reproduce locally: `env -- 'BUCK_SCRATCH_PATH=buck-out/v2/tmp/root/213ed1b7ab869379/__backend-npm-install__/npm' '//tool ...<omitted>... buck-out/v2/gen/root/213ed1b7ab869379/__backend-npm-install__/node_modules --environment development (run `buck2 log what-failed` to get the full command)`
Spawning executable `//toolchains/python:python-standalone[interpreter]` failed: Failed to spawn a process
$ buckle log what-failed
Showing commands from: /Users/dan/Library/Caches/buckle/a1226c67e221a84c1562008739dcb322710e86e2/buck2 build //:thing-that-uses-python
build   root//:thing-that-uses-python (prelude//platforms:default#213ed1b7ab869379) (npm)   local   env -- 'TMPDIR=/Users/dan/devel/backend2/buck-out/v2/tmp/root/213ed1b7ab869379/__backend-npm-install__/npm' 'BUCK_SCRATCH_PATH=buck-out/v2/tmp/root/213ed1b7ab869379/__backend-npm-install__/npm' 'BUCK2_DAEMON_UUID=89552eba-0149-445a-b0ee-f7d6ee0a0df3' 'BUCK_BUILD_ID=ad2238c7-8161-47d5-b3fb-598a98a18e23' '//toolchains/python:python-standalone[interpreter]' buck-out/v2/gen/prelude-replay/213ed1b7ab869379/npm/__npm_install.py__/npm_install.py --npm buck-out/v2/gen/toolchains/213ed1b7ab869379/__node-18.16.1__/node-18.16.1/bin/npm --package_json ./package.json --package_lock ./package-lock.json --output buck-out/v2/gen/root/213ed1b7ab869379/__backend-npm-install__/node_modules --environment development

Note how it's just sticking the literal string //toolchains/python:python-standalone[interpreter] in the command.

cbarrete commented 3 months ago

I believe that you'll want to use something like "$(exe //toolchains/python:python-standalone[interpreter])" instead, which gets replaced by the location of the artifact (the interpreter in that case). exe also ensures that it has a RunInfo, and location is also available when that's not required.

jazzdan commented 3 months ago

@cbarrete yeah I tried that too but it doesn't look like that attribute supports the macros

Spawning executable `$(exe @prelude-replay//python:python-standalone[interpreter])` failed: Failed to spawn a process
Showing commands from: /Users/dan/Library/Caches/buckle/a1226c67e221a84c1562008739dcb322710e86e2/buck2 build //:backend-npm-install
build   root//:backend-npm-install (prelude//platforms:default#213ed1b7ab869379) (npm)  local   env -- 'TMPDIR=/Users/dan/devel/backend2/buck-out/v2/tmp/root/213ed1b7ab869379/__backend-npm-install__/npm' 'BUCK_SCRATCH_PATH=buck-out/v2/tmp/root/213ed1b7ab869379/__backend-npm-install__/npm' 'BUCK2_DAEMON_UUID=89552eba-0149-445a-b0ee-f7d6ee0a0df3' 'BUCK_BUILD_ID=345f1930-17d7-4b88-b3da-be18602adcf4' '$(exe @prelude-replay//python:python-standalone[interpreter])' buck-out/v2/gen/prelude-replay/213ed1b7ab869379/npm/__npm_install.py__/npm_install.py --npm buck-out/v2/gen/toolchains/213ed1b7ab869379/__node-18.16.1__/node-18.16.1/bin/npm --package_json ./package.json --package_lock ./package-lock.json --output buck-out/v2/gen/root/213ed1b7ab869379/__backend-npm-install__/node_modules --environment development
jazzdan commented 3 months ago

I ironed out some issues and got a working hermetic Python toolchain (at least on mac x86, mac arm64 and linux x86, Windows is untested). I posted it here https://github.com/jazzdan/buck2-python-toolchain-problem