bazelbuild / rules_python

Bazel Python Rules
https://rules-python.readthedocs.io
Apache License 2.0
527 stars 539 forks source link

Specify different interpreter for py_binary rules #62

Closed mouadino closed 3 years ago

mouadino commented 6 years ago

Hey all,

Problem Statement

It's quite confusing how a user can specify that a given rule need to use Python3 or that rule need Python2, where we have multiple attributes documented like default_python_version, srcs_version and undocumented like :py_interpreter (which can't be used but out of curiosity what is the prefix : doing), however I am still confused regarding what does what, in the end, what I think we should have is a way to tell py_binary to execute using the interpreter X, I know about py_runtime which is supposed to be the future but there are multiple py_binary rules out there with unspecify python version which lead to issues of python3 and 2 compatibilities, beside there is an assumption with the introduced with --python_top argument that all code need one interepreter instead of having local to each target.

Workaround

Currently we use the following workaround, maybe useful for others that suffer same issue:

The Decorator Pattern

  1. Create a python file with the following format, let's name it force_python3.py:
import sys
import os

PYTHON3_PATH = os.environ['PYTHON3_PATH']

if __name__ == '__main__':
    argv = sys.argv
    path, _ = os.path.split(argv[0])
    argv[0] = os.path.join(path, '<real main file>')
    argv = [PYTHON3_PATH] + argv

    os.execv(argv[0], argv)

(This assume that we expose a PYTHON3_PATH environment variable with --action_env=PYTHON3_PATH=<path>).

  1. Write py_binary rule by pointing main attribute to the file above:
py_binary(
    name = "my_target",
    srcs = ["main.py", "force_python3.py"],
    main = "force_python3.py"
)

FYI we have a workaround of our own to specify which python to use for pip requirements.

Goals

What I would like to see are the followings:

MarkusTeufelberger commented 6 years ago

When building default to python2

I strongly disagree. Python 2 should not even be supported imho, but it definitely should NOT be made default less than 2 years before upstream's extended support ends. Unfortunately a lot of Bazel's internal stuff is still thinking that Python 2 is system default, this would be a good time to update it instead of keeping it on life support even longer. End the suffering.

Build on previous point, if py_runtime works in same say way go rules do, which use the new toolchains idea it will be awesome.

I agree, especially keeping in mind that there are several Python interpreters out there (cPython just being one of them - there's also pypy, jython, cython...) and that it would be nice to be able to easily choose from a variety of them for releases, tests and benchmarks.

mouadino commented 6 years ago

Thanks for your feedback, just to clarify when I said:

When building default to python2

I meant as a workaround (unless you don't agree with the workaround if so I would love to hear better alternatives), b/c as you said it's the current state of some Bazel extensions, I would love if we don't have to support Python2 anymore and that what we did in my current company, but since we adopted Bazel we are back to supporting it, and defaulting to Python2 in build seems to do it, for now at least, things will be much worse if there are extensions that require Python3 and other Python2, especially since setting up Python version is a global thing, so in this move toward Python3 please make sure to both are supported until transitions end.

IMHO the most important thing that I am trying to emphasize is to not make python version a global thing, which the mistake that --python_top did, Python is not like Golang in that regard so IMHO we need a way to specify Python version at target/code level, thoughts?

MarkusTeufelberger commented 6 years ago

I think one of the questions is if a py_binary should be a self contained executable (= even if you don't have Python installed, it should run) or just contain Python code (= using one of the installed Python interpreters).

I guess most people would expect the latter and at that point I would rather see PEP394 (https://www.python.org/dev/peps/pep-0394/) being used instead of some environment variable decorator magic. A #!/usr/bin/env python3 line should be enough.

There is the problem that Bazel itself also depends on Python being present on the host system for some of its scripts/rules and it mostly expects Python2 there (see https://github.com/bazelbuild/bazel/issues/1580). I don't know if that's going to be phased out, now that the skylark language runtime has been rewritten in go and if they might be convinced to rewrite the remaining few helpers in go. Alternatively Bazel would need some way to have a vendored Python binary that it uses to run its internal rule scripts instead of relying on Python being present on its host machine.

The Python rules here seem to suffer a bit from this assumption that Python (unlike e.g. the go binary) anyways has to be present on the build system since it has to run Bazel and Python has to be installed for that. Making sure that the system Python is never used when building Python rules with Bazel is an important step in my opinion and to ensure this, it might also be beneficial to make sure that Bazel itself doesn't actually depend on any Python being installed on the system.

mouadino commented 6 years ago

I guess most people would expect the latter and at that point I would rather see PEP394 (https://www.python.org/dev/peps/pep-0394/) being used instead of some environment variable decorator magic. A #!/usr/bin/env python3 line should be enough.

I agree about the self-contained Python IMHO self-contained Python package is nice to have but not necessary, however about using #!/usr/bin/env python3 shebang, overall I think it's the right approach but just to play the devil advocate, the few issues that I can see are:

From the good side, I can see it fixing issues when containerizing python code b/c currently if you use host machine python path in py_runtime and you package the code in a container, then you should have the same python path inside your container b/c python_stub_template.txt template will end up pointing to py_runtime's interpreter_path variable.

Alternatively Bazel would need some way to have a vendored Python binary that it uses to run its internal rule scripts instead of relying on Python being present on its host machine.

Isn't this already feasible with py_runtime already? at least if the python scripts aren't called from WORKSPACE, as in theory, you can do:

new_http_archive(
    name = "python_version",
    urls = ["https://www.python.org/ftp/python/3.6.3/Python-3.6.3.tgz"],
    strip_prefix = "Python-3.6.3",
    build_file_content = """
genrule(
    name = "build_python",
    srcs = glob(["**"]),
    outs = ["python"],
    cmd = "./external/python_version/configure && make && cp python $@",
    visibility = ["//visibility:public"],
)""",
)

The Python rules here seem to suffer a bit from this assumption that Python (unlike e.g. the go binary) anyways has to be present on the build system since it has to run Bazel and Python has to be installed for that.

I can see that you're looking at this from the lenses of Bazel core developer, from a user perspective I can tell you that Python suffers the assumption that by just pointing code to an interpreter and that's it, sadly it's not that easy, besides it seems that Google use Python a bit different, hence the awkward support of Python in Bazel that probably was inherited from Blaze, in theory you should:

  1. Choose the right version (not an issue in Golang, at least not yet :))
  2. make sure that all dependencies are there and the chosen interpreter can find them 2.1. this part can be annoying if you try to do things differently for import my other two issues https://github.com/bazelbuild/rules_python/issues/created_by/mouadino 2.2. or if you depend on shared libraries and such
  3. Last packaging, which has been a standing issue in Python community with no official answer but many solution pex, par, https://github.com/takluyver/flit, ...
MarkusTeufelberger commented 6 years ago

I am not a Bazel core developer or even employed by Google, I'm a normal user...

mouadino commented 6 years ago

@MarkusTeufelberger Ahh sorry for the confusion, but then just to reiterate, our experience has shown that the workaround above works for us, most of our issues concerning python version were due to extensions and not Bazel internal, but it's possible that our use cases are different than yours, that's why we see different issues.

duggelz commented 6 years ago

Yes, it's good to remember that Python, and software developing in general, happens in a wide variety of environments, and so people's requirements can legitimately be very different. One person's "must have feature" is another's "we would never use this", and that's fine.

Here's a discussion group that may be of interest: https://groups.google.com/forum/#!forum/bazel-sig-python

Here's a document proposing a py_toolchain target for Bazel. I also talk about the evolution of Python usage inside Google, to give some background about why Bazel is the way it is. And I talk about some ways that Bazel can evolve to serve a wider community. Let me know what you think: https://docs.google.com/document/u/1/d/1dQjbbLEJqxUIJWmH5sIZAv-_emnKksDI-VCG1v86dWA/edit#

franzigeiger commented 5 years ago

Is there already a solution to this feature request? I can't see how all these issues relate to the question or if they avoid the initially mentioned workaround. If it is fixed by now, can you please provide a small example code.

sha1n commented 5 years ago

I managed to make that work properly for me by specifying python_version="PY3" on py_binary and py_test targets + adding the following to my .bazelrc:

build --incompatible_use_python_toolchains - this tells Bazel to use python toolchains, which should auto-detect your platform python 2 and 3 runtimes and use the appropriate one based what you specify in your target. You should be able to take than further, and specify your runtimes explicitly if you use custom paths or want to download the interpreter from a remote repository (see: https://github.com/bazelbuild/bazel/blob/master/tools/python/toolchain.bzl).

In addition, I added build --noincompatible_py3_is_default in order to avoid python 3 from being selected when no python_version is explicitly specified.

I use Bazel 0.26.0 (I think python toolchains have been introduced in 0.25-ish) and the following rules_python version:

git_repository(
    name = "io_bazel_rules_python",
    commit = "fdbb17a4118a1728d19e638a5291b4c4266ea5b8",
    remote = "https://github.com/bazelbuild/rules_python.git",
)
thundergolfer commented 3 years ago

Checking the "Goals" section of the original issue description:

Add an attribute to py_binary and py_test rules to specify which python interpreter to use ....

python_version=PY3

py_runtime should be specified in WORKSPACE ...

The Python toolchain is not available at WORKSPACE evaluation time, but with the pip_install repo rule you can specify a label in the python_interpreter_target attribute that is the same label as what's passed to py_runtime's interpreter (Label: optional) attribute. Eg.

pip_install(
    name = "foo_pip",
    python_interpreter_target = "@python3//:bin/python3",
    requirements = "//tools/build/dependencies/python:requirements.txt",
)

Build on previous point, if py_runtime works in same say way go rules do, which use the new toolchains idea it will be awesome.

Bazel Toolchains work with the Python rules now.

shipping the interpreter with it will be very convenient...

With --build_python_zip and an in-build interpreter target you can ship the interpreter within an executable .zip.


Given the above, I think this issue can be closed, but please let me know if there's something still missing and we can create a new more specific issue.

dgrunwald commented 1 year ago

Unfortunately it's not possible to use python_interpreter_target with py_binary:

no such attribute 'python_interpreter_target' in 'py_binary' rule

Toolchains seem overly complicated for our use-case (we have multiple py_binary, where some py_binaries need a different python interpreter than other py_binaries). In fact I've spent 2 days on this now and still can't figure out at all how to make this work.

andponlin-canva commented 3 months ago

I would also be interested in this feature being available.