Deprecate automatic creation of empty `__init__.py` files

limdor commented 5 years ago

Description of the problem / feature request:

Bazel is creating init.py files automatically. This is an issue in several situations (see https://github.com/bazelbuild/rules_python/issues/55).

This behavior can be suppressed using: https://docs.bazel.build/versions/master/be/python.html#py_test.legacy_create_init

But in case that you have a large code base with a lot of python rules this is not a good solution. It would mean that this flag needs to be set in every single rule in case that you have a folder in the root of the workspace with the same name as a python package. When this happens, it is not so easy to figure out what is going on. For modern Python developers init.py files are like source, these are not generated files. Thus, is not intuitive that the error could be because some init files are automatically generated.

Feature requests: what underlying problem are you trying to solve with this feature?

In a first step: There should be an option to decide the behavior of legacy_create_init globally. In a second step: The default option should be not to generate init.py files by default.

Bugs: what's the simplest, easiest way to reproduce this bug? Please provide a minimal example if possible.

Same like https://github.com/bazelbuild/rules_python/issues/55 or any setup that at the root of the workspace you have a folder called like a builtin python module (time, logging, platform, queue, etc.)

What operating system are you running Bazel on?

Ubuntu 18.04

What's the output of `bazel info release`?

release 0.22.0

brandjon commented 5 years ago

The end state shouldn't be a global option but rather removing the legacy create init behavior altogether. This might look like the following.

Go through an incompatible change migration to set the default value of legacy_create_init to false.
Then go through an incompatible change migration to force the value to false for everyone, effectively making the attribute a no-op.
Then go through an incompatible change migration to delete the redundant attribute.

brandjon commented 5 years ago

The first step of the above is implemented in @Faqa's PR #9271, which is in the process of being merged. It introduces an incompatible change flag tracked by #10076.

Steps 2 and 3 above can come after that. They can actually be merged into a single step. But that'll come down the line, since I don't anticipate flipping the first flag for a while. (We need to see how difficult the migration is, and if it requires new tooling, that'll be hard to prioritize.)

We can use this bug to track the overall deprecation effort.

brandjon commented 5 years ago

A concern came up in internal review of #9271: How will a user know that a failure at execution time (from missing __init__.py) is due to flipping this flag?

We already have some logic in the stub script to give users a more informative error message when a failure may be due to a different incompatible change. We can adopt a similar strategy for this change, so that when the incompatible flag is enabled, the exit code is 1, and the stderr matches ^(ModuleNotFoundError|ImportError):, we print an additional message linking to the incompatible change docs. (The second incompatible change, to remove legacy_create_init altogether, won't need any special diagnostic because it'll be a straightforward analysis-time error.)

The downside of this approach is that it means Bazel injects stderr spam into failing py_binarys. But since it would only trigger when legacy_create_init is set to the default value, auto, the user could opt out of the spam on a per-target basis by explicitly setting this attr to true or false -- which works until the second incompatible change to delete the attr. We'd remove the special message from the stub script once the first incompatible flag is flipped.

scele commented 4 years ago

@brandjon Curious, are there some existing best practices for writing BUILD files when automatic __init__.py file generation is switched off?

Suppose that I have the following setup:

foo/
   __init__.py
   BUILD
   lib/
      __init__.py
      lib.py
      BUILD
   bin/
      __init__.py
      bin.py
      BUILD

With automatic __init__.py file generation, there need not be checked-in __init__.py files, and the BUILD files could look like this:

# foo/lib/BUILD
py_library(
    name = "lib",
    srcs = ["lib.py"],
)

# foo/bin/BUILD
py_library(
    name = "bin",
    srcs = ["bin.py"],
    deps = [
        "//foo/lib",
    ],
)

With manual __init__.py file generation, I guess I'd have to set up something like this:

# foo/BUILD
py_library(
    name = "init",
    srcs = ["__init__.py"],
)

# foo/lib/BUILD
py_library(
    name = "init",
    srcs = ["__init__.py"],
)
py_library(
    name = "lib",
    srcs = ["lib.py"],
    deps = [":init"],
)

# foo/bin/BUILD
py_library(
    name = "init",
    srcs = ["__init__.py"],
)
py_library(
    name = "bin",
    srcs = ["bin.py"],
    deps = [
        ":init",
        "//foo/lib",
        "//foo:init",
    ],
)

This is unfortunately pretty verbose, but somewhat doable for the most part I guess.. To me, the most annoying thing seems to be that we have to explicitly add dependencies to the top-level //foo:init target, in order to have foo/__init__.py show up in runfiles and make import foo.<something> work at runtime. It does not seem to scale well when the repository becomes larger: if I have a target //a/b/c/d:lib, it would need to depend on all parent __init__.py files, like //a:init, //a/b:init, //a/b/c:init. Encouraging a pattern where child libs mechanically depend on their parent packages could also be seen problematic - usually it's a good thing for unit test cacheability if leaf directories are independent.

Or maybe I'm missing something - are there existing bazel code bases that have switched automatic __init__.py generation off?

brandjon commented 4 years ago

There's no recommended best practice for what dependency structure to use for __init__.py files. You could have it be included in the srcs of multiple py_librarys in the same package, or factor it into its own target and add it to deps. For parent init files you would have to depend on the parent packages.

limdor commented 4 years ago

I will take the liberty to answer, I would disagree on having the init.py files in seperate dependencies. In this example, as you said with automatic __init__.py it would look something like this, but the issue that I have with automatic __init__.py is that several are not needed.

foo/
   __init__.py   <--- Not needed
   BUILD
   lib/
      __init__.py
      lib.py
      BUILD
   bin/
      __init__.py   <--- Not needed
      bin.py
      BUILD

An init file is to indicate that is a module, for libraries this should be part of a library, I would not recommend to remove it from the srcs of the library and put it separate.

# foo/lib/BUILD
py_library(
    name = "lib",
    srcs = [
        "__init__.py",  <-- This should be part of the library, it is indicating what is a module
        "lib.py",
    ],
    imports = ["."],
)

The imports "." is needed to import from this library, I think that this is unfortunate but probably relates to be able to import always from the root of the workspace. If it would be a big repository with a lot of different languages mix I don't think that scales always starting your import from the workspace. Here you can see the example that you put how is best to my eyes but it will depend a lot from the project and this is just my personal opinion. https://github.com/limdor/bazel-examples/tree/master/python

I would love to extend the example with more complex situations, I am also preparing one example to see how someone could integrate pylint into bazel. I am sure that all together we will find the best way for each situation.

chickenandpork commented 4 years ago

Likely a naive idea: Is it possible to create a merging py_library variant that properly merges two libraries with clashing namespace?

For example, with google-cloud-datastore and google-cloud-bigquery, the proper result of a single dependency that offers google.cloud.* based on these two dependencies can then be a dependency of other targets. You'd have a BUILD something like:

py_test(
  name = "test_ns",
  srcs = [
    "test_ns.py",
  ],
  deps = [
      ":merged_google_cloud",
  ],
)

py_merged_library(      # new explicit merge action
  name = "merged_google_cloud",
  deps = [
    requirement("google-cloud-datastore"),
    requirement("google-cloud-bigquery"),
  ],
)

This magical new py_merged_library essentially pip-installs both dependencies, and repacks a single wheel.

limdor commented 3 years ago

@brandjon I just realized that we missed the oportunity to do the default flip on Bazel 4. What is the status? Are there any blockers to switch the default?

aaliddell commented 3 years ago

Could this be added to the changes for Bazel 5?

brandjon commented 3 years ago

This is not currently being worked on.

I don't think we have plans to prioritize Python rules work within Bazel, though we are looking into migrating them to Starlark (as with many other native rule sets). It's conceivable that migration might require removing edge cases and strange behaviors like this one.

Irvenae commented 2 years ago

+1 for resolving this.

We had a major issue with this and after a long search found a hack https://github.com/dillon-giacoppo/rules_python_external/issues/38

EricCousineau-TRI commented 1 year ago

I've posted a related issue, #17691 - if explicit py_library(*, imports) could have precedence over implicitly created imports, then it may alleviate pain points related to implicitly created __init__.py files.

tpudlik commented 1 year ago

I noticed rules_python now sets this flag in its own .bazelrc. @rickeylev is the maintainers' view that everyone should be building with --incompatible_default_to_explicit_init_py, and we just haven't gotten around to flipping the flag at the Bazel level?

If my project flips this flag (which is very convenient for us), are we pre-adopting the future, or heading down a blind alley?

rickeylev commented 12 months ago

Yeah, the flag will be flipped -- eventually. It a source of confusing behavior and bugs. I'm not sure when it'll be flipped, but it'll be after the rules_python Starlark implementation is activated (which is waiting for Bazel 7 to be released).

martis42 commented 4 months ago

@rickeylev By now the rules_python Starlark implementation exists (thanks a lot for your efforts!) and Bazel 7 is half a year old. Do you believe we can flip this flag now?

bazelbuild / bazel