Open allsey87 opened 4 months ago
setting PYTHONSAFEPATH in env has no effect
Yeah, we should respect an outer env setting at least.
Using env
to set this does raise a bit of a design question. Normally that attribute is only respected when bazel run
or bazel test
is used with the target. That behavior has always struck me as a bit surprising. But, that behavior is also what most rules do, I think? So it's not surprising insofar as its the existing convention.
Anyways, at the least, we can update the logic so it doesn't always force the value.
setting PYTHONSAFEPATH in env has no effect
Yeah, we should respect an outer env setting at least.
I would lean towards a dedicated attribute which also clearly documents that this environment variable is set by default, but allowing it to be unset via the env
attribute would also be sufficient.
Using
env
to set this does raise a bit of a design question. Normally that attribute is only respected whenbazel run
orbazel test
is used with the target.
Do you mean env
taking effect during bazel build
on a py_binary
?
Any thoughts @aignas? If we can agree on disabling via env
or a dedicated attribute, I can put a PR together.
How about using -P
flag when starting the python interpreter instead of PYTHONSAFEPATH
? That way only the py_binary itself would be effected and not all of its transitive sub-processes?
How about using
-P
flag when starting the python interpreter instead ofPYTHONSAFEPATH
? That way only the py_binary itself would be effected and not all of its transitive sub-processes?
That sounds like quite a good solution. I am just wondering how this would interact with things like the multiprocessing library which starts up other interpreters. I guess arguments such as -P
would be forwarded to those interpreters?
Do you mean env taking effect during bazel build on a py_binary?
Yes. So given py_binary(env={"FOO": "value"})
, the generated executable (bash script, in this case) ultimately calls e.g. env FOO=value python3 main.py
use dedicated attribute
The issue with dedicated attributes for these sort of settings is the API surface is very large because it grows to encompass all the env vars and interpreter flags, of which there are many: https://docs.python.org/3/using/cmdline.html
I am +1 on a general interpreter_args
attribute, though. IIRC, not all the interpreter args have environment variables and vice-versa.
use -P to set this because it won't propagate to subprocesses
This does sound appealing. Actually, that sounds necessary if one py_binary has another py_binary as a data dependency and they have different settings.
It'll be a bit annoying to code since the shell code will need to reconcile the outer environment, the env
attribute values, and the interpreter_args
values, but quite doable.
interaction with multiprocessing
Multiprocessing will propagate the interpreter args (sys.flags) either directly when spawn is used, or implicitly when fork is used. For environment variables, it'll inherit them.
So, insofar as multiprocessing is concerned, these settings will propagate regardless.
Reopening because:
The PR fixes it for bootstrap=script, but that isn't the default yet. While it'll become the default in a couple releases, it won't be used for Windows. Also, I suspect that zip invocations going through foo.zip/__main__.py
will still be forcing safe path.
@allsey87 if you want to send a PR to update https://github.com/bazelbuild/rules_python/blob/main/python/private/python_bootstrap_template.txt, SGTM. All it probably requires is modifying L500 to conditionally set PYTHONSAFEPATH based on the existing environment. You can probably copy/paste the //tests/base_rules:inherit_pythonsafepath_env_test
that PR 2076 added and just set bootstrap_impl="system_python"
and it should Just Work to verify behavior.
I probably won't have time to fix this myself before next release, but can probably fit in reviewing a PR.
I'm travelling at the moment so the earliest I can get to this will be on the 29th
@rickeylev looking into this now but I have stumbled upon what might be another bug. It seems the environment variables set via the env
attribute of py_binary
are not visible from python_bootstrap_template.txt?
diff --git a/python/private/python_bootstrap_template.txt b/python/private/python_bootstrap_template.txt
index 0f9c90b..083b880 100644
--- a/python/private/python_bootstrap_template.txt
+++ b/python/private/python_bootstrap_template.txt
@@ -497,7 +497,8 @@ def Main():
# Don't prepend a potentially unsafe path to sys.path
# See: https://docs.python.org/3.11/using/cmdline.html#envvar-PYTHONSAFEPATH
- new_env['PYTHONSAFEPATH'] = '1'
+ if os.environ.get('PYTHONSAFEPATH') != '':
+ new_env['PYTHONSAFEPATH'] = '1'
main_filename = os.path.join(module_space, main_rel_path)
main_filename = GetWindowsPathWithUNCPrefix(main_filename)
os.environ
does contain environment variables, but setting env
like this does not seem to affect those variables:
""" Rule for building meson from source. """
load("@rules_python//python:defs.bzl", "py_binary")
def meson_tool(name, main, data, requirements = [], **kwargs):
py_binary(
name = name,
srcs = [main],
data = data,
deps = requirements,
python_version = "PY3",
main = main,
env = {
"PYTHONSAFEPATH": ""
},
**kwargs
)
Is this the expected behaviour?
env
only affects the environment when bazel run
or bazel test
is used; I'm guessing that's what you're seeing.
This is why I pointed to the //tests/base_rules:inherit_pythonsafepath_env_test
target -- it runs the py_binary manually with the envvar set to the different values to affect behavior.
env
only affects the environment whenbazel run
orbazel test
is used; I'm guessing that's what you're seeing.This is why I pointed to the
//tests/base_rules:inherit_pythonsafepath_env_test
target -- it runs the py_binary manually with the envvar set to the different values to affect behavior.
This all sounds a bit hacky to me. Considering that env
variables are only supposed to apply when bazel run
and bazel test
are used, it seems to me that it would be better to just add a parameter called safe_path
which defaults to True
and applies to build, run, and test.
The issue with dedicated attributes for these sort of settings is the API surface is very large because it grows to encompass all the env vars and interpreter flags, of which there are many: https://docs.python.org/3/using/cmdline.html
I see where you are coming from, but would say that PYTHONSAFEPATH
is an exception here since it is not the default behaviour of the Python interpreter and it is being explicitly set by rules_python
without a way to opt out of it.
In my opinion, either adding the attribute safe_path
or just using -P
instead as @sbc100 pointed out are the best the solutions to this problem. Using env
to "undo" this just feels counterintuitive to me. This being said, I am fine with proceeding in whichever direction you think is best.
Considering that env variables are only supposed to apply when bazel run and bazel test are used
Are they really, though? It's "merely" a convention today, and I couldn't find anything authoritative saying they must be ignored outside of run/test. Over the years, I've seen people (myself included) confused and annoyed that the env
and args
attributes aren't respected when bazel-bin/foo
is used. Hm, I wonder if it simply derives from e.g. C-style executables: such programs are native executables and they typically don't have a clean way to "inject" startup code before the user code (main
) runs.
In any case, we're skirting this design question for now by just making the startup respect the existing environment variable, if present. It will already respect other Python environment variables (e.g. PYTHONHASHSEED, PYTHONIOENCODING, etc), so it makes it a bit more consistent in that regard. (Whether they're set via the py_test.env
attribute, ctx.actions.run(env=...)
, or manually in the command line invocation).
I'm pretty strongly -1 on a dedicated attribute. There's simply too many other PYTHON*
environment variables and Python command line options for a tenable attribute-based API, and the set of them changes outside our control. When I look at the environment variables and command line flags, I can imagine a few more that would make sense to enable by default (the site package ones, hash randomization (when exec mode), maybe dont-write-byte-code, ...). I also don't like having a third way to control these settings. It's already confusing that there's an environment variable and interpreter arg that control it.
Use
-P
to control it
I'd be fine with -P
being used instead of using the environment variable. I'd approve a PR changing things to pass -P
instead. However, it still leaves the problem of how to opt out.
Use
-P
to control itI'd be fine with
-P
being used instead of using the environment variable. I'd approve a PR changing things to pass-P
instead. However, it still leaves the problem of how to opt out.
Well, I think this is actually an XY problem. What I actually need is to just stop PYTHONSAFEPATH
propagating to child processes, which -P
would achieve. I don't actually see the need for opting out of safe path for a given py_binary
. Sorry if I didn't communicate this well.
🚀 feature request
Relevant Rules
py_binary
Description
py_binary
setsPYTHONSAFEPATH=1
which while being a sensible default does cause problems. Most notably, inrules_foreign_cc
wherepy_binary
is used to wrap up the Meson build system (written in Python). While this is not an issue for the build system itself, it is an issue when Meson runs compilers or build scripts that are written in Python. The issue here is that Meson is starting processes which inherit Meson's environment variables includingPYTHONSAFEPATH
. This often breaks scripts which rely on importing modules from the directory in which they are contained.Describe the solution you'd like
I would like an option on
py_binary
that allows me to opt out ofPYTHONSAFEPATH
.Describe alternatives you've considered
--action_env=PYTHONSAFEPATH=
: no effectenv
argument onpy_binary
as follows: no effect