bazelbuild / rules_python

Bazel Python Rules
https://rules-python.readthedocs.io
Apache License 2.0
521 stars 537 forks source link

import rules_python doesn't work under bzlmod #1679

Open rickeylev opened 8 months ago

rickeylev commented 8 months ago

šŸž bug report

This is the rules_python issue for https://github.com/bazelbuild/bazel/issues/18128

Affected Rule

Import the runfiles library via the @rules_python//python/runfiles depending as import rules_python.python.runfiles

Is this a regression?

Effectively, yes. The established name for the libraries rules_python provides are under the "rules_python" namespace.

Description

In bzlmod, import rules_python.python.runfiles doesn't work. It works for workspace builds.

This is because, in bzlmod, the directory names of repos in runfiles changes from "foo" to "foo~version" (where version is the module version plus some other parts). Such a name isn't importable because it can't be known in advance, and isn't a valid python package a name.

šŸ”¬ Minimal Reproduction

An import error will occur.

Analysis

Fixing this is sort of straightforward -- we need to get a directory onto sys.path that has a rules_python sub-directory.

Some complicating factors are:

  1. We need import python.runfiles to continue working. This is the name we told people to use in the meantime, so we'll need to stick with that.
  2. We want to avoid double imports, e.g. the same file being compiled twice; this is what happens if you do "import rules_python.python.runfiles.runfiles" and "import python.runfiles.runfiles" in a workspace build today (Python keys things by the module name, not the underlying file name)
  3. Repo roots are still on sys.path and explicit init generation is still enabled by default. This means, wherever we create a sub directory, its top-most directory is going to end up on sys.path. This means a standard src-based layout might be prone to breaking things, as dependents might be relying on import src.bla.bla themselves. This came up via slack, where I've seen people recommend using a repo-relative import name like import src.bla.
  4. Pip-based dependencies use import runfiles, so we have to be not interfere with that
  5. The runfiles code uses its own path to help location the runfiles root, so make sure to update that; this is an internal detail; just noting it for myself
  6. Personally, I worry that something is relying on the $repoRoot/python/runfiles/runfiles.py file existing. So I'm hesitant to move that file. I don't know what or why that would be the case though, so perhaps I'm being
aiuto commented 8 months ago

cc: @aiuto

matts1 commented 8 months ago

FWIW, in ChromeOS we worked around the issue of repo mapping by creating a sitecustomize.py that reads the repo_mapping file to determine what sys.modules should contain.

Mind you, ChromeOS has a few extra hacks in there to make existing non-bazel imports work, but you can see the general idea.

matts1 commented 8 months ago

@rickeylev pointed out that the toolchain itself is relevant here as well, so here's the link to our toolchain. We wrap the interpreter in a script that ensures that the sitecustomize.py is in your PYTHONPATH

armandomontanez commented 4 months ago

Just hit this as well, I noticed the PR that supposedly fixed this is approved but stalled. Any updates?

tpudlik commented 2 months ago

Is there a broader bug tracking the issue of Python import paths in bzlmod? rules_python is just a particular special case. Any Bazel project workspace(name = "foo") whose users imported Python libraries from it as import foo.some.module will see these users broken when switching to bzlmod.

As pointed out in https://github.com/bazelbuild/bazel/issues/2636#issuecomment-1718863405, this also seems to make it infeasible to ever flip the --experimental_python_import_all_repositories flag. Is the plan to finally give up on that otherwise-desirable migration (in which case we should update and close that issue), or do we hope to come up with a solution?

@Wyverald @meteorcloudy

meteorcloudy commented 2 months ago

My personal opinion is to abandon this usage since this is Bazel specific usage. I believe for most python projects, they have a top level directory matching its pip package name. But @rickeylev should make the final decision ;)

groodt commented 2 months ago

The comment from https://github.com/bazelbuild/rules_python/issues/1679#issuecomment-2243450876 is related, but also adjacent to the original issue. The original issue relates to the "runfiles" library specifically. This is a very narrow and special library that is jointly included as part of rules_python, but also published independently as a package on PyPI and that's when we discovered this particular issue that the Python "runfiles" module that comes with rules_python will no longer work with bzlmod under the old import paths.

The original issue does relate to general user space usage of all Python modules with Bazel. Those indeed should not be using workspace name as import prefixes. That's never been a documented or recommended approach.

matts1 commented 2 months ago

Those indeed should not be using workspace name as import prefixes. That's never been a documented or recommended approach.

IIRC (I last looked at this around a year ago), there wasn't any documented or recommended approach, and people used the workspace name because if I import foo.bar.baz python looks for the path foo/bar/baz, and the bazel python launcher added the external directory containing the repository foo, which meant that it just worked.

To the best of my knowledge, there's no import path that will work other than that one, unless you want to screw around with sitecustomize.py like I did (but I also never dug too deep into this, so I may just not be aware of the right way to do it).

groodt commented 2 months ago

It's probably a documentation issue, that wouldn't surprise me. Our examples are not great, but they do demonstrate the intended usage (and always have, while they've existed):

e.g. https://github.com/bazelbuild/rules_python/blob/59bb4a88783cca738394cb3cb91bc8243df5e18d/examples/multi_python_versions/tests/my_lib_test.py#L18

matts1 commented 2 months ago

In that example, libs.my_lib is defined in //libs/my_lib though. What we're asking is how to import a file in @foo//libs/my_lib.

Before repo mapping, I believe that import foo.libs.my_lib would work, but now that the directory is named _main~~foo~foo or something along those lines, it won't work.

groodt commented 2 months ago

Got it. If you can please raise a different issue for that?

This specific issue is about the runfiles library that previously was encouraged to be used from inside rules_python. Your issue is about general inter-repository code sharing under bzlmod.

matts1 commented 2 months ago

I'll go ahead and do that, but FWIW, I think that workspace name as prefixes is the correct way to go, and that if we fix that, we get this for free.

groodt commented 2 months ago

I'll go ahead and do that, but FWIW, I think that workspace name as prefixes is the correct way to go, and that if we fix that, we get this for free.

In some ways yes, but in others ways no. The runfiles library is a special case and also has easy alternatives right now. It already has a solution for bzlmod in so far as people can depend on the third-party dependency through the third-party language dependency management rules.

So either through a dependency specifier in that resolves to a locked version in PyPI e.g. runfiles ~=0.30

Or as a direct dependency url specifier such e.g. runfiles @ https://files.pythonhosted.org/packages/de/a4/7d6b131e92aa1e589e97d82b54fa87c1c9162c650c8128faa3800d96b8e2/bazel_runfiles-0.34.0-py3-none-any.whl#sha256=7df2573bfda9d95f4c352907333f016d0bc5e0a3c67101a933eacc4fb001786e

The reason for this issue being created, was because it can be seen as a backwards incompatible change with the integrated runfiles library that used to come with the core language rules of rules_python and was documented usage.

There's never been documented usage for how to use or resolve python source dependencies across bazel respositories to my knowledge.

rickeylev commented 2 months ago

Yeah, using the Bazel repo name as the Python top-level import name was always a bad idea. Even prior to bzlmod, because repos names were decided by the end user, you never had a strong, reliable, top-level Python name for a library made available through that mechanism. The best analogy I can draw is with Java: import rules_java.com.google.bazel.blabla; is obviously non-sensical -- any Java person would immediately scoff at having to prefix their Java imports with rules_java (or whatever the Bazel repo name happened to be).

Unfortunately, using the repo name turned into some sort of mild convention. And bzlmod threw a big wrench into it. Doing the Right Thing (rehoming code to another directoy and using the imports attribute) is a bit of a lift for projects relying on that convention.

A lighter weight, more interim/transnational, solution would be nicer. I'm not sure exactly what that solution would look like, though. It's complicated by the fact that the runfiles root and each repo's root is added to sys.path -- both undesirable behaviors. I think the lighter solution looks something like, if you wanted to make import rules_python Just Work again, then it'd be something like:

I think that would mostly work? The issue with playing games like this with sys.path and a package's sub-module search path (__path__) (or the import system in general) is it can break, or otherwise act weird, in edge cases.

Also, I'm pretty sure the above can also be done on the user side. e.g., if you're @foo, you can add a directory named rules_python, futz with __path__ as above, and cause it to import stuff from @rules_python. If multiple repos are doing that, things might get weird.

matts1 commented 2 months ago

Yeah, using the Bazel repo name as the Python top-level import name was always a bad idea. Even prior to bzlmod, because repos names were decided by the end user, you never had a strong, reliable, top-level Python name for a library made available through that mechanism. The best analogy I can draw is with Java: import rules_java.com.google.bazel.blabla; is obviously non-sensical -- any Java person would immediately scoff at having to prefix their Java imports with rules_java (or whatever the Bazel repo name happened to be).

I'm not sure I buy that it was a bad idea. Namespacing is what prevents conflicts. If I depend on the py_library targets @foo//lib and @bar//lib, then if we didn't include the repo name we'd be using import lib for both, which is both a conflict, but also makes it almost impossible to work out which file you're attempting to import.

I, for one, would much prefer writing import rules_python.runfiles over import runfiles, as it adds clarity about where I'm getting this "runfiles" from, in addition to preventing conflicts.

FWIW, in ChromeOS, our main repo was called cros, and every time you wanted to import something in the main repo we'd write import cros.foo.bar to reference @@//foo/bar. It worked very well, and there was never any confusion about where the imports came from. It also meant that if we ever exported our module, it would just work and others would just be able to import our libraries, since cros would always be repo-mapped to our repo from itself.

armandomontanez commented 2 months ago

I feel like part of the problem here is that it feels like rules_python should be exposing runfiles as rules_python.runfiles--regardless of the underlying technical issues at hand. Admittedly maybe that's just due to my naive belief that overly generic names are problematic.

Generally, it's probably wisest for projects to create well-formed python packages, which unfortunately means storing the package in a subdirectory that provides a project/package namespace. In Pigweed, we almost never relied on the import paths provided by rules_python. Even when we did, the only exceptions were for some tests. I reluctantly feel like that's the most robust way to structure a project. Our Python libraries were well-formed python packages before we started embracing Bazel, so this change didn't impact us much (besides the confusion of rules_python.runfiles moving).

I would like to echo, though, that this is a pretty jarring behavioral change that seems to warrant some sort of migration path. It's surprising to me that this issue wasn't immediately closed with "this is intentional behavior, we flipped the default value of X flag with plans to remove the option entirely in version YY"