Open rickeylev opened 8 months ago
cc: @aiuto
FWIW, in ChromeOS we worked around the issue of repo mapping by creating a sitecustomize.py that reads the repo_mapping file to determine what sys.modules
should contain.
Mind you, ChromeOS has a few extra hacks in there to make existing non-bazel imports work, but you can see the general idea.
@rickeylev pointed out that the toolchain itself is relevant here as well, so here's the link to our toolchain. We wrap the interpreter in a script that ensures that the sitecustomize.py
is in your PYTHONPATH
Just hit this as well, I noticed the PR that supposedly fixed this is approved but stalled. Any updates?
Is there a broader bug tracking the issue of Python import paths in bzlmod? rules_python
is just a particular special case. Any Bazel project workspace(name = "foo")
whose users imported Python libraries from it as import foo.some.module
will see these users broken when switching to bzlmod.
As pointed out in https://github.com/bazelbuild/bazel/issues/2636#issuecomment-1718863405, this also seems to make it infeasible to ever flip the --experimental_python_import_all_repositories
flag. Is the plan to finally give up on that otherwise-desirable migration (in which case we should update and close that issue), or do we hope to come up with a solution?
@Wyverald @meteorcloudy
My personal opinion is to abandon this usage since this is Bazel specific usage. I believe for most python projects, they have a top level directory matching its pip package name. But @rickeylev should make the final decision ;)
The comment from https://github.com/bazelbuild/rules_python/issues/1679#issuecomment-2243450876 is related, but also adjacent to the original issue. The original issue relates to the "runfiles" library specifically. This is a very narrow and special library that is jointly included as part of rules_python, but also published independently as a package on PyPI and that's when we discovered this particular issue that the Python "runfiles" module that comes with rules_python will no longer work with bzlmod under the old import paths.
The original issue does relate to general user space usage of all Python modules with Bazel. Those indeed should not be using workspace name as import prefixes. That's never been a documented or recommended approach.
Those indeed should not be using workspace name as import prefixes. That's never been a documented or recommended approach.
IIRC (I last looked at this around a year ago), there wasn't any documented or recommended approach, and people used the workspace name because if I import foo.bar.baz
python looks for the path foo/bar/baz
, and the bazel python launcher added the external
directory containing the repository foo
, which meant that it just worked.
To the best of my knowledge, there's no import path that will work other than that one, unless you want to screw around with sitecustomize.py
like I did (but I also never dug too deep into this, so I may just not be aware of the right way to do it).
It's probably a documentation issue, that wouldn't surprise me. Our examples are not great, but they do demonstrate the intended usage (and always have, while they've existed):
In that example, libs.my_lib
is defined in //libs/my_lib
though. What we're asking is how to import a file in @foo//libs/my_lib
.
Before repo mapping, I believe that import foo.libs.my_lib
would work, but now that the directory is named _main~~foo~foo
or something along those lines, it won't work.
Got it. If you can please raise a different issue for that?
This specific issue is about the runfiles
library that previously was encouraged to be used from inside rules_python. Your issue is about general inter-repository code sharing under bzlmod.
I'll go ahead and do that, but FWIW, I think that workspace name as prefixes is the correct way to go, and that if we fix that, we get this for free.
I'll go ahead and do that, but FWIW, I think that workspace name as prefixes is the correct way to go, and that if we fix that, we get this for free.
In some ways yes, but in others ways no. The runfiles
library is a special case and also has easy alternatives right now. It already has a solution for bzlmod in so far as people can depend on the third-party dependency through the third-party language dependency management rules.
So either through a dependency specifier in that resolves to a locked version in PyPI e.g. runfiles ~=0.30
Or as a direct dependency url specifier such e.g. runfiles @ https://files.pythonhosted.org/packages/de/a4/7d6b131e92aa1e589e97d82b54fa87c1c9162c650c8128faa3800d96b8e2/bazel_runfiles-0.34.0-py3-none-any.whl#sha256=7df2573bfda9d95f4c352907333f016d0bc5e0a3c67101a933eacc4fb001786e
The reason for this issue being created, was because it can be seen as a backwards incompatible change with the integrated runfiles
library that used to come with the core language rules of rules_python and was documented usage.
There's never been documented usage for how to use or resolve python source dependencies across bazel respositories to my knowledge.
Yeah, using the Bazel repo name as the Python top-level import name was always a bad idea. Even prior to bzlmod, because repos names were decided by the end user, you never had a strong, reliable, top-level Python name for a library made available through that mechanism. The best analogy I can draw is with Java: import rules_java.com.google.bazel.blabla;
is obviously non-sensical -- any Java person would immediately scoff at having to prefix their Java imports with rules_java
(or whatever the Bazel repo name happened to be).
Unfortunately, using the repo name turned into some sort of mild convention. And bzlmod threw a big wrench into it. Doing the Right Thing (rehoming code to another directoy and using the imports
attribute) is a bit of a lift for projects relying on that convention.
A lighter weight, more interim/transnational, solution would be nicer. I'm not sure exactly what that solution would look like, though. It's complicated by the fact that the runfiles root and each repo's root is added to sys.path -- both undesirable behaviors. I think the lighter solution looks something like, if you wanted to make import rules_python
Just Work again, then it'd be something like:
$repo/rules_python/__init__.py
(where $repo/
is the repo root, e.g. the root of the github source tree). This makes import rules_python
work under bzlmod, but only because the repo root is still added to sys.path by default.$repo/rules_python/__init__.py
do __path__.append('..')
. This makes import rules_python.XXX
search the parent directory of import rules_python
, and thus find $repo/runfiles
as rules_python.runfiles
.@rules_python//runfiles:runfiles
do imports = ['..']
. This makes deps=["@rules_python//runfiles:runfiles"]
ensure the $repo
is added to sys.path, and thus $repo/rules_python
is importable as import rules_python
. Even if after repo roots are no longer added to sys.path by default.I think that would mostly work? The issue with playing games like this with sys.path and a package's sub-module search path (__path__
) (or the import system in general) is it can break, or otherwise act weird, in edge cases.
Also, I'm pretty sure the above can also be done on the user side. e.g., if you're @foo
, you can add a directory named rules_python
, futz with __path__
as above, and cause it to import stuff from @rules_python
. If multiple repos are doing that, things might get weird.
Yeah, using the Bazel repo name as the Python top-level import name was always a bad idea. Even prior to bzlmod, because repos names were decided by the end user, you never had a strong, reliable, top-level Python name for a library made available through that mechanism. The best analogy I can draw is with Java:
import rules_java.com.google.bazel.blabla;
is obviously non-sensical -- any Java person would immediately scoff at having to prefix their Java imports withrules_java
(or whatever the Bazel repo name happened to be).
I'm not sure I buy that it was a bad idea. Namespacing is what prevents conflicts. If I depend on the py_library
targets @foo//lib
and @bar//lib
, then if we didn't include the repo name we'd be using import lib
for both, which is both a conflict, but also makes it almost impossible to work out which file you're attempting to import.
I, for one, would much prefer writing import rules_python.runfiles
over import runfiles
, as it adds clarity about where I'm getting this "runfiles" from, in addition to preventing conflicts.
FWIW, in ChromeOS, our main repo was called cros
, and every time you wanted to import something in the main repo we'd write import cros.foo.bar
to reference @@//foo/bar
. It worked very well, and there was never any confusion about where the imports came from. It also meant that if we ever exported our module, it would just work and others would just be able to import our libraries, since cros
would always be repo-mapped to our repo from itself.
I feel like part of the problem here is that it feels like rules_python should be exposing runfiles as rules_python.runfiles
--regardless of the underlying technical issues at hand. Admittedly maybe that's just due to my naive belief that overly generic names are problematic.
Generally, it's probably wisest for projects to create well-formed python packages, which unfortunately means storing the package in a subdirectory that provides a project/package namespace. In Pigweed, we almost never relied on the import paths provided by rules_python. Even when we did, the only exceptions were for some tests. I reluctantly feel like that's the most robust way to structure a project. Our Python libraries were well-formed python packages before we started embracing Bazel, so this change didn't impact us much (besides the confusion of rules_python.runfiles
moving).
I would like to echo, though, that this is a pretty jarring behavioral change that seems to warrant some sort of migration path. It's surprising to me that this issue wasn't immediately closed with "this is intentional behavior, we flipped the default value of X flag with plans to remove the option entirely in version YY"
š bug report
This is the rules_python issue for https://github.com/bazelbuild/bazel/issues/18128
Affected Rule
Import the runfiles library via the
@rules_python//python/runfiles
depending asimport rules_python.python.runfiles
Is this a regression?
Effectively, yes. The established name for the libraries rules_python provides are under the "rules_python" namespace.
Description
In bzlmod,
import rules_python.python.runfiles
doesn't work. It works for workspace builds.This is because, in bzlmod, the directory names of repos in runfiles changes from "foo" to "foo~version" (where version is the module version plus some other parts). Such a name isn't importable because it can't be known in advance, and isn't a valid python package a name.
š¬ Minimal Reproduction
@rules_python//python/runfiles
import rules_python
An import error will occur.
Analysis
Fixing this is sort of straightforward -- we need to get a directory onto sys.path that has a rules_python sub-directory.
Some complicating factors are:
import python.runfiles
to continue working. This is the name we told people to use in the meantime, so we'll need to stick with that.import src.bla.bla
themselves. This came up via slack, where I've seen people recommend using a repo-relative import name likeimport src.bla
.import runfiles
, so we have to be not interfere with that$repoRoot/python/runfiles/runfiles.py
file existing. So I'm hesitant to move that file. I don't know what or why that would be the case though, so perhaps I'm being