bazelbuild / rules_python

Bazel Python Rules
https://rules-python.readthedocs.io
Apache License 2.0
534 stars 541 forks source link

Add an option to opt-out of PYTHONSAFEPATH #2060

Open allsey87 opened 4 months ago

allsey87 commented 4 months ago

🚀 feature request

Relevant Rules

py_binary

Description

py_binary sets PYTHONSAFEPATH=1 which while being a sensible default does cause problems. Most notably, in rules_foreign_cc where py_binary is used to wrap up the Meson build system (written in Python). While this is not an issue for the build system itself, it is an issue when Meson runs compilers or build scripts that are written in Python. The issue here is that Meson is starting processes which inherit Meson's environment variables including PYTHONSAFEPATH. This often breaks scripts which rely on importing modules from the directory in which they are contained.

Describe the solution you'd like

I would like an option on py_binary that allows me to opt out of PYTHONSAFEPATH.

py_binary(
   name = "mypybinary",
   safe_path = False, # defaults to True
)

Describe alternatives you've considered

py_binary(
   name = "mypybinary",
   env = {
      "PYTHONSAFEPATH": ""
   }
)
rickeylev commented 4 months ago

setting PYTHONSAFEPATH in env has no effect

Yeah, we should respect an outer env setting at least.

Using env to set this does raise a bit of a design question. Normally that attribute is only respected when bazel run or bazel test is used with the target. That behavior has always struck me as a bit surprising. But, that behavior is also what most rules do, I think? So it's not surprising insofar as its the existing convention.

Anyways, at the least, we can update the logic so it doesn't always force the value.

allsey87 commented 4 months ago

setting PYTHONSAFEPATH in env has no effect

Yeah, we should respect an outer env setting at least.

I would lean towards a dedicated attribute which also clearly documents that this environment variable is set by default, but allowing it to be unset via the env attribute would also be sufficient.

Using env to set this does raise a bit of a design question. Normally that attribute is only respected when bazel run or bazel test is used with the target.

Do you mean env taking effect during bazel build on a py_binary?

allsey87 commented 4 months ago

Any thoughts @aignas? If we can agree on disabling via env or a dedicated attribute, I can put a PR together.

sbc100 commented 4 months ago

How about using -P flag when starting the python interpreter instead of PYTHONSAFEPATH? That way only the py_binary itself would be effected and not all of its transitive sub-processes?

allsey87 commented 4 months ago

How about using -P flag when starting the python interpreter instead of PYTHONSAFEPATH? That way only the py_binary itself would be effected and not all of its transitive sub-processes?

That sounds like quite a good solution. I am just wondering how this would interact with things like the multiprocessing library which starts up other interpreters. I guess arguments such as -P would be forwarded to those interpreters?

rickeylev commented 4 months ago

Do you mean env taking effect during bazel build on a py_binary?

Yes. So given py_binary(env={"FOO": "value"}), the generated executable (bash script, in this case) ultimately calls e.g. env FOO=value python3 main.py

use dedicated attribute

The issue with dedicated attributes for these sort of settings is the API surface is very large because it grows to encompass all the env vars and interpreter flags, of which there are many: https://docs.python.org/3/using/cmdline.html

I am +1 on a general interpreter_args attribute, though. IIRC, not all the interpreter args have environment variables and vice-versa.

use -P to set this because it won't propagate to subprocesses

This does sound appealing. Actually, that sounds necessary if one py_binary has another py_binary as a data dependency and they have different settings.

It'll be a bit annoying to code since the shell code will need to reconcile the outer environment, the env attribute values, and the interpreter_args values, but quite doable.

interaction with multiprocessing

Multiprocessing will propagate the interpreter args (sys.flags) either directly when spawn is used, or implicitly when fork is used. For environment variables, it'll inherit them.

So, insofar as multiprocessing is concerned, these settings will propagate regardless.

rickeylev commented 3 months ago

Reopening because:

The PR fixes it for bootstrap=script, but that isn't the default yet. While it'll become the default in a couple releases, it won't be used for Windows. Also, I suspect that zip invocations going through foo.zip/__main__.py will still be forcing safe path.

@allsey87 if you want to send a PR to update https://github.com/bazelbuild/rules_python/blob/main/python/private/python_bootstrap_template.txt, SGTM. All it probably requires is modifying L500 to conditionally set PYTHONSAFEPATH based on the existing environment. You can probably copy/paste the //tests/base_rules:inherit_pythonsafepath_env_test that PR 2076 added and just set bootstrap_impl="system_python" and it should Just Work to verify behavior.

I probably won't have time to fix this myself before next release, but can probably fit in reviewing a PR.

allsey87 commented 3 months ago

I'm travelling at the moment so the earliest I can get to this will be on the 29th

allsey87 commented 3 months ago

@rickeylev looking into this now but I have stumbled upon what might be another bug. It seems the environment variables set via the env attribute of py_binary are not visible from python_bootstrap_template.txt?

diff --git a/python/private/python_bootstrap_template.txt b/python/private/python_bootstrap_template.txt
index 0f9c90b..083b880 100644
--- a/python/private/python_bootstrap_template.txt
+++ b/python/private/python_bootstrap_template.txt
@@ -497,7 +497,8 @@ def Main():

   # Don't prepend a potentially unsafe path to sys.path
   # See: https://docs.python.org/3.11/using/cmdline.html#envvar-PYTHONSAFEPATH
-  new_env['PYTHONSAFEPATH'] = '1'
+  if os.environ.get('PYTHONSAFEPATH') != '':
+    new_env['PYTHONSAFEPATH'] = '1'

   main_filename = os.path.join(module_space, main_rel_path)
   main_filename = GetWindowsPathWithUNCPrefix(main_filename)

os.environ does contain environment variables, but setting env like this does not seem to affect those variables:

""" Rule for building meson from source. """

load("@rules_python//python:defs.bzl", "py_binary")

def meson_tool(name, main, data, requirements = [], **kwargs):
    py_binary(
        name = name,
        srcs = [main],
        data = data,
        deps = requirements,
        python_version = "PY3",
        main = main,
        env = {
            "PYTHONSAFEPATH": ""
        },
        **kwargs
    )

Is this the expected behaviour?

rickeylev commented 3 months ago

env only affects the environment when bazel run or bazel test is used; I'm guessing that's what you're seeing.

This is why I pointed to the //tests/base_rules:inherit_pythonsafepath_env_test target -- it runs the py_binary manually with the envvar set to the different values to affect behavior.

allsey87 commented 3 months ago

env only affects the environment when bazel run or bazel test is used; I'm guessing that's what you're seeing.

This is why I pointed to the //tests/base_rules:inherit_pythonsafepath_env_test target -- it runs the py_binary manually with the envvar set to the different values to affect behavior.

This all sounds a bit hacky to me. Considering that env variables are only supposed to apply when bazel run and bazel test are used, it seems to me that it would be better to just add a parameter called safe_path which defaults to True and applies to build, run, and test.

The issue with dedicated attributes for these sort of settings is the API surface is very large because it grows to encompass all the env vars and interpreter flags, of which there are many: https://docs.python.org/3/using/cmdline.html

I see where you are coming from, but would say that PYTHONSAFEPATH is an exception here since it is not the default behaviour of the Python interpreter and it is being explicitly set by rules_python without a way to opt out of it.

In my opinion, either adding the attribute safe_path or just using -P instead as @sbc100 pointed out are the best the solutions to this problem. Using env to "undo" this just feels counterintuitive to me. This being said, I am fine with proceeding in whichever direction you think is best.

rickeylev commented 3 months ago

Considering that env variables are only supposed to apply when bazel run and bazel test are used

Are they really, though? It's "merely" a convention today, and I couldn't find anything authoritative saying they must be ignored outside of run/test. Over the years, I've seen people (myself included) confused and annoyed that the env and args attributes aren't respected when bazel-bin/foo is used. Hm, I wonder if it simply derives from e.g. C-style executables: such programs are native executables and they typically don't have a clean way to "inject" startup code before the user code (main) runs.

In any case, we're skirting this design question for now by just making the startup respect the existing environment variable, if present. It will already respect other Python environment variables (e.g. PYTHONHASHSEED, PYTHONIOENCODING, etc), so it makes it a bit more consistent in that regard. (Whether they're set via the py_test.env attribute, ctx.actions.run(env=...), or manually in the command line invocation).

I'm pretty strongly -1 on a dedicated attribute. There's simply too many other PYTHON* environment variables and Python command line options for a tenable attribute-based API, and the set of them changes outside our control. When I look at the environment variables and command line flags, I can imagine a few more that would make sense to enable by default (the site package ones, hash randomization (when exec mode), maybe dont-write-byte-code, ...). I also don't like having a third way to control these settings. It's already confusing that there's an environment variable and interpreter arg that control it.

Use -P to control it

I'd be fine with -P being used instead of using the environment variable. I'd approve a PR changing things to pass -P instead. However, it still leaves the problem of how to opt out.

allsey87 commented 3 months ago

Use -P to control it

I'd be fine with -P being used instead of using the environment variable. I'd approve a PR changing things to pass -P instead. However, it still leaves the problem of how to opt out.

Well, I think this is actually an XY problem. What I actually need is to just stop PYTHONSAFEPATH propagating to child processes, which -P would achieve. I don't actually see the need for opting out of safe path for a given py_binary. Sorry if I didn't communicate this well.