bazelbuild / rules_python

Bazel Python Rules
https://rules-python.readthedocs.io
Apache License 2.0
530 stars 539 forks source link

__init__.pyi files are ignored by add_pkgutil_style_namespace_pkg_init #381

Closed jake-arkinstall closed 3 years ago

jake-arkinstall commented 3 years ago

When trying to use the pip package ciso8601, I found that my project saw the module as empty. On further inspection, I found that this package had provided an __init__.pyi file (see https://github.com/closeio/ciso8601/tree/master/ciso8601), but no __init__.py. I'm not very familiar with how the module system treats __init__.pyi files, but it seems to work fine in isolation.

However, an __init__.py provided by add_pkgutil_style_namespace_pkg_init was taking precedence.

When I removed the __init__.py, my project could see the ciso8601 module as intended. The problem also goes away when enable_implicit_namespace_pkgs = True is provided to pip_install, but this may not generalize well to other packages (nothing has broken yet, but I can only presume the functionality is there for a reason).

The issue, at least for this purpose, is that add_pkgutil_style_namespace_pkg_init is only looking for a __init__.py. My suggestion is to also check for an __init__.pyi before creating the new __init__.py.

thundergolfer commented 3 years ago

Thanks for your issue report. I'm not familiar with whether a __init__.pyi is a valid init file, but looks like something that should be investigated.

thundergolfer commented 3 years ago

I've created a reproduction repository: https://github.com/thundergolfer-playground/rules_python-issue-381

When I removed the __init__.py, my project could see the ciso8601 module as intended

I've confirmed the same. The added __init__.py is certainly breaking this package. The package resolution process is as follows:

During import processing, the import machinery will continue to iterate over each directory in the parent path as it does in Python 3.2. While looking for a module or package named "foo", for each directory in the parent path:

  • If <directory>/foo/__init__.py is found, a regular package is imported and returned.
  • If not, but <directory>/foo.{py,pyc,so,pyd} is found, a module is imported and returned. The exact list of extension varies by platform and whether the -O flag is specified. The list here is representative.
  • If not, but <directory>/foo is found and is a directory, it is recorded and the scan continues with the next directory in the parent path.
  • Otherwise the scan continues with the next directory in the parent path.

source

So the added __init__.py causes import ciso8601 to match bazel-bin/reproduce_bug/main.runfiles/pypi/pypi__ciso8601/ as a regular package, and it has no modules in it so you get something like AttributeError: module 'ciso8601' has no attribute 'parse_datetime'.

I think that the crux of the issue is that the namespace package code in the pip_install repo rule is mistaken about what a 'non empty' directory is from Python's perspective.

The contents of ciso8601/ are:

ciso8601/
        __init__.pyi
       py.typed

Neither of these files is a Python module file and so the folder should be considered empty and not a namespace package. Then no __init__.py would be found and the import resolution code would then look for bazel-bin/reproduce_bug/main.runfiles/pypi/pypi__ciso8601/ciso8601.{py,pyc,so,pyd} and find the ciso8601.<platform tag>.so file (eg. ciso8601.cpython-39-darwin.so).

The importing of the .so file is what happens when we hack-ily delete the __init__.py file. We can see this by doing the following:

# having hackily deleted the ciso8601/__init__.py file...
strace -ff -o trace_output bazel run //reproduce_bug:main
grep ciso trace_output*

Snippet of output:

...
trace_output.9101:stat("/home/vscode/.cache/bazel/_bazel_vscode/584402666701bf83c9c2e932dede8c12/execroot/rules_python_demo/bazel-out/k8-fastbuild/bin/reproduce_bug/main.runfiles/pypi/pypi__ciso8601", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
trace_output.9101:stat("/home/vscode/.cache/bazel/_bazel_vscode/584402666701bf83c9c2e932dede8c12/execroot/rules_python_demo/bazel-out/k8-fastbuild/bin/reproduce_bug/main.runfiles/pypi/pypi__ciso8601/ciso8601/__init__.cpython-39-x86_64-linux-gnu.so", 0x7fffe2305d80) = -1 ENOENT (No such file or directory)
trace_output.9101:stat("/home/vscode/.cache/bazel/_bazel_vscode/584402666701bf83c9c2e932dede8c12/execroot/rules_python_demo/bazel-out/k8-fastbuild/bin/reproduce_bug/main.runfiles/pypi/pypi__ciso8601/ciso8601/__init__.abi3.so", 0x7fffe2305d80) = -1 ENOENT (No such file or directory)
trace_output.9101:stat("/home/vscode/.cache/bazel/_bazel_vscode/584402666701bf83c9c2e932dede8c12/execroot/rules_python_demo/bazel-out/k8-fastbuild/bin/reproduce_bug/main.runfiles/pypi/pypi__ciso8601/ciso8601/__init__.so", 0x7fffe2305d80) = -1 ENOENT (No such file or directory)
trace_output.9101:stat("/home/vscode/.cache/bazel/_bazel_vscode/584402666701bf83c9c2e932dede8c12/execroot/rules_python_demo/bazel-out/k8-fastbuild/bin/reproduce_bug/main.runfiles/pypi/pypi__ciso8601/ciso8601/__init__.py", 0x7fffe2305d80) = -1 ENOENT (No such file or directory)
trace_output.9101:stat("/home/vscode/.cache/bazel/_bazel_vscode/584402666701bf83c9c2e932dede8c12/execroot/rules_python_demo/bazel-out/k8-fastbuild/bin/reproduce_bug/main.runfiles/pypi/pypi__ciso8601/ciso8601/__init__.pyc", 0x7fffe2305d80) = -1 ENOENT (No such file or directory)
trace_output.9101:stat("/home/vscode/.cache/bazel/_bazel_vscode/584402666701bf83c9c2e932dede8c12/execroot/rules_python_demo/bazel-out/k8-fastbuild/bin/reproduce_bug/main.runfiles/pypi/pypi__ciso8601/ciso8601", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
trace_output.9101:stat("/home/vscode/.cache/bazel/_bazel_vscode/584402666701bf83c9c2e932dede8c12/execroot/rules_python_demo/bazel-out/k8-fastbuild/bin/reproduce_bug/main.runfiles/pypi/pypi__ciso8601/ciso8601.cpython-39-x86_64-linux-gnu.so", {st_mode=S_IFREG|0755, st_size=21280, ...}) = 0
trace_output.9101:openat(AT_FDCWD, "/home/vscode/.cache/bazel/_bazel_vscode/584402666701bf83c9c2e932dede8c12/execroot/rules_python_demo/bazel-out/k8-fastbuild/bin/reproduce_bug/main.runfiles/pypi/pypi__ciso8601/ciso8601.cpython-39-x86_64-linux-gnu.so", O_RDONLY|O_CLOEXEC) = 3
...

The last two lines of the snippet show the python process finding and opening the .so file, as we want.

Whitelist or Blacklist?

So only some files should be considered as relevant to determining whether a directory is non-empty from the namespace packaging perspective.

We could have a blacklist and say .pyi files and py.typed files should be ignored.

Or we could have a whitelist and only certain file extensions are included, like {py,pyc,so,pyd} but I think Windows uses dll as well.

At the moment I'd lean towards whitelist because a blacklist will be more likely to keep throwing up edge cases that require expanding the blacklist.

Refs

thundergolfer commented 3 years ago

Began fixing this here: https://github.com/bazelbuild/rules_python/tree/jonathon--issue-381-may2021

thundergolfer commented 3 years ago

Fixed on https://github.com/thundergolfer-playground/rules_python-issue-381 with latest commit on ☝️ above branch:

rules_python_version = "c361e789a246c81649dc8d9700d29b8a32a3d00f"   # jonathon--issue-381-may2021
http_archive(
    name = "rules_python",
    url = "https://github.com/bazelbuild/rules_python/archive/{version}.tar.gz".format(version = rules_python_version),
    strip_prefix = "rules_python-{version}".format(version = rules_python_version),
    sha256 = "a6f339763303be086c7c51a38bc214c020d15709b0b82e4b1bedcf4a31e9bd6a",
)

Will do a bit more acceptance testing before throwing up the PR, but this is close to resolved I think (finally).