bazelbuild / rules_python

Bazel Python Rules
https://rules-python.readthedocs.io
Apache License 2.0
538 stars 542 forks source link

Packages shadowed by other modules are no longer importable under bzlmod #1174

Open aschleck opened 1 year ago

aschleck commented 1 year ago

🐞 bug report

Affected Rule

The issue is caused by the rule: `py_binary`/`py_library` under bzlmod ### Is this a regression? Yes, this works under the WORKSPACE model. It is only bzlmod that is broken. ### Description When two Bazel workspaces (with one depending on the other) both define packages with the same name (`common`, `core`, `helpers`, `private`, `util`, ...) the outer one will shadow the inner one. Without bzlmod this has always been solvable by using absolute imports with the workspace name as the prefix. However, under bzlmod, these absolute imports no longer work: when a module is used as an outer repository the prefix is `_main` but when a module is used as a dependency repository it is ``. And even if that was fine, there are further problems caused by `local_path_override` because it suffixes `~override` onto the folders in the runfiles directory. From my perspective the best solution would be to make the generated `py_binary` wrapper script read the generated `_repo_mapping` file and set up the path such that absolute imports work the same way they've always worked. One way to make that happen would be to have the wrapper script create a temporary directory, create symlinks with the correct names pointing to the real folders, and then add the temporary directory to the PYTHONPATH. I could imagine there is probably some cleaner way I can't think of. ## πŸ”¬ Minimal Reproduction I put an example here: https://github.com/aschleck/bzlmod-shadow-python-problem (with another writeup in `README.md`.) This repository illustrates three situations: * [shadowed_with_workspace](https://github.com/aschleck/bzlmod-shadow-python-problem/tree/master/shadowed_with_workspace) shows a workspace depending on another workspace where both define a package named `core`. In this case, the `outer` workspace imports `dep` absolutely but `dep`'s import of itself fails due to being shadowed. * [with_workspace](https://github.com/aschleck/bzlmod-shadow-python-problem/tree/master/with_workspace) illustrates the obvious fix: `dep` can always import itself with its workspace name as a prefix (like `import dep.core.`. Another solution would be to use relative imports, but those get unwieldy and annoying in large repos. * [with_bzlmod](https://github.com/aschleck/bzlmod-shadow-python-problem/tree/master/with_bzlmod) fails due to `rules_python` not respecting the `_repo_mapping` file for imports. The absolute import of `dep` from `outer` fails because the folder in the runfiles is actually `dep~override` since `local_path_override` was used. Even if it was named `dep`, the code in `dep` itself could no longer assume that `import dep` refers to itself (because when its rules are run directly it is `_main`, and so it would need to do `import _main.core` not `import dep.core`.) ## πŸ”₯ Exception or Error The error is what you'd expect: the import fails.

~/bzlmod-shadow-python-problem/with_bzlmod/outer $ bazel run //core
INFO: Analyzed target //core:core (0 packages loaded, 0 targets configured).
INFO: Found 1 target...
Target //core:core up-to-date:
  bazel-bin/core/core
INFO: Elapsed time: 0.212s, Critical Path: 0.00s
INFO: 1 process: 1 internal.
INFO: Build completed successfully, 1 total action
INFO: Running command line: bazel-bin/core/core
imported outer.core.outer
Traceback (most recent call last):
  File "/home/april/.cache/bazel/_bazel_april/1d28658223745047b82756f6e8e88a66/execroot/_main/bazel-out/k8-fastbuild/bin/core/core.runfiles/_main/core/outer.py", line 3, in 
    import dep.core.interface
ModuleNotFoundError: No module named 'dep'
## 🌍 Your Environment **Operating System:**
  
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=22.04
DISTRIB_CODENAME=jammy
DISTRIB_DESCRIPTION="Ubuntu 22.04.2 LTS"
  
**Output of `bazel version`:**
  
Bazelisk version: v1.14.0
Build label: 6.1.1
Build target: bazel-out/k8-opt/bin/src/main/java/com/google/devtools/build/lib/bazel/BazelServer_deploy.jar
Build time: Wed Mar 15 15:44:56 2023 (1678895096)
Build timestamp: 1678895096
Build timestamp as int: 1678895096
  
**Rules_python version:**
  
0.20.0
  
**Anything else relevant?** Thank you for maintaining these rules! It's been amazing to see how well this all works on the whole.
chrislovecnm commented 1 year ago

create symlinks with the correct names pointing to the real folders

I have not tried, but I have had many problems with symlinks and Windows. So we may not want to go down this path.

fmeum commented 1 year ago

Directory symlinks aka junctions have been available on Windows for a long time and are relied upon by Bazel quite heavily. They could work here.

A simple "solution" that is recommended for Bazel-aware Python packages is to move everything into a top-level directory with the desired name. Since repo roots are on the Python path, this should allow existing absolute imports to continue to work.

Wyverald commented 1 year ago

@rickeylev in case you're not aware of this. We discussed this problem around a year ago I think, and I think your recommendation was to set up a PyPI-esque import dir merging all needed repos (and forgoing repo names altogether). Could that be adapted to help in @aschleck's case? Is this process documented?

cc @meteorcloudy

rickeylev commented 1 year ago

Not having the repo name in the import name is ideal, yes.

That said, requiring that to switch to bzlmod is a rather big hurdle.

Having the repo mapping available during analysis would open a few more options.

Hmmm. There might be another way out, too. Not 100% sure on this.

Basically, a user creates a directory with the desired import name. In that dir, they create an init.py file. They set the imports attribute of the target to .. (they can nest an additional directory to avoid additional pollution). Within the init, there's a few options. It could add to __path__, which will make imports of submodules search those added paths. Another option is to manually load and exec the original higher level file and replace itself in sys.modules. This has the advantage of avoiding a temp dir with symlinks. It is also something a library can do, so doesn't require a binary to do anything special.

Symlinks would also work. They'll require additional changes to the startup script, plus cleanup logic, though a user's main could also be written to do the equiv before running the real main module code.

github-actions[bot] commented 10 months ago

This issue has been automatically marked as stale because it has not had any activity for 180 days. It will be closed if no further activity occurs in 30 days. Collaborators can add an assignee to keep this open indefinitely. Thanks for your contributions to rules_python!

aschleck commented 10 months ago

I no longer work at the company where this matters but it still is a thing

github-actions[bot] commented 3 months ago

This issue has been automatically marked as stale because it has not had any activity for 180 days. It will be closed if no further activity occurs in 30 days. Collaborators can add an assignee to keep this open indefinitely. Thanks for your contributions to rules_python!

aschleck commented 3 months ago

Still a thing as far as I know

rickeylev commented 1 month ago

Another idea: add something to the end of sys.path as a last-chance way to handle these. By adding to the end, we can minimize interfering with other import dirs. Could generate a dir with shim files (using __path__), or an import hook.

Also, in case it wasn't clear: the main thing preventing progress on this issue is available time. There's several ideas, but someone has to try something to see how well it works or doesn't work.