Closed jake-arkinstall closed 3 years ago
Thanks for your issue report. I'm not familiar with whether a __init__.pyi
is a valid init file, but looks like something that should be investigated.
I've created a reproduction repository: https://github.com/thundergolfer-playground/rules_python-issue-381
When I removed the
__init__.py
, my project could see theciso8601
module as intended
I've confirmed the same. The added __init__.py
is certainly breaking this package. The package resolution process is as follows:
During import processing, the import machinery will continue to iterate over each directory in the parent path as it does in Python 3.2. While looking for a module or package named "foo", for each directory in the parent path:
- If
<directory>/foo/__init__.py
is found, a regular package is imported and returned.- If not, but
<directory>/foo.{py,pyc,so,pyd}
is found, a module is imported and returned. The exact list of extension varies by platform and whether the -O flag is specified. The list here is representative.- If not, but
<directory>/foo
is found and is a directory, it is recorded and the scan continues with the next directory in the parent path.- Otherwise the scan continues with the next directory in the parent path.
So the added __init__.py
causes import ciso8601
to match bazel-bin/reproduce_bug/main.runfiles/pypi/pypi__ciso8601/
as a regular package, and it has no modules in it so you get something like AttributeError: module 'ciso8601' has no attribute 'parse_datetime'
.
I think that the crux of the issue is that the namespace package code in the pip_install
repo rule is mistaken about what a 'non empty' directory is from Python's perspective.
The contents of ciso8601/
are:
ciso8601/
__init__.pyi
py.typed
Neither of these files is a Python module file and so the folder should be considered empty and not a namespace package. Then no __init__.py
would be found and the import resolution code would then look for bazel-bin/reproduce_bug/main.runfiles/pypi/pypi__ciso8601/ciso8601.{py,pyc,so,pyd}
and find the ciso8601.<platform tag>.so
file (eg. ciso8601.cpython-39-darwin.so
).
The importing of the .so
file is what happens when we hack-ily delete the __init__.py
file. We can see this by doing the following:
# having hackily deleted the ciso8601/__init__.py file...
strace -ff -o trace_output bazel run //reproduce_bug:main
grep ciso trace_output*
Snippet of output:
...
trace_output.9101:stat("/home/vscode/.cache/bazel/_bazel_vscode/584402666701bf83c9c2e932dede8c12/execroot/rules_python_demo/bazel-out/k8-fastbuild/bin/reproduce_bug/main.runfiles/pypi/pypi__ciso8601", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
trace_output.9101:stat("/home/vscode/.cache/bazel/_bazel_vscode/584402666701bf83c9c2e932dede8c12/execroot/rules_python_demo/bazel-out/k8-fastbuild/bin/reproduce_bug/main.runfiles/pypi/pypi__ciso8601/ciso8601/__init__.cpython-39-x86_64-linux-gnu.so", 0x7fffe2305d80) = -1 ENOENT (No such file or directory)
trace_output.9101:stat("/home/vscode/.cache/bazel/_bazel_vscode/584402666701bf83c9c2e932dede8c12/execroot/rules_python_demo/bazel-out/k8-fastbuild/bin/reproduce_bug/main.runfiles/pypi/pypi__ciso8601/ciso8601/__init__.abi3.so", 0x7fffe2305d80) = -1 ENOENT (No such file or directory)
trace_output.9101:stat("/home/vscode/.cache/bazel/_bazel_vscode/584402666701bf83c9c2e932dede8c12/execroot/rules_python_demo/bazel-out/k8-fastbuild/bin/reproduce_bug/main.runfiles/pypi/pypi__ciso8601/ciso8601/__init__.so", 0x7fffe2305d80) = -1 ENOENT (No such file or directory)
trace_output.9101:stat("/home/vscode/.cache/bazel/_bazel_vscode/584402666701bf83c9c2e932dede8c12/execroot/rules_python_demo/bazel-out/k8-fastbuild/bin/reproduce_bug/main.runfiles/pypi/pypi__ciso8601/ciso8601/__init__.py", 0x7fffe2305d80) = -1 ENOENT (No such file or directory)
trace_output.9101:stat("/home/vscode/.cache/bazel/_bazel_vscode/584402666701bf83c9c2e932dede8c12/execroot/rules_python_demo/bazel-out/k8-fastbuild/bin/reproduce_bug/main.runfiles/pypi/pypi__ciso8601/ciso8601/__init__.pyc", 0x7fffe2305d80) = -1 ENOENT (No such file or directory)
trace_output.9101:stat("/home/vscode/.cache/bazel/_bazel_vscode/584402666701bf83c9c2e932dede8c12/execroot/rules_python_demo/bazel-out/k8-fastbuild/bin/reproduce_bug/main.runfiles/pypi/pypi__ciso8601/ciso8601", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
trace_output.9101:stat("/home/vscode/.cache/bazel/_bazel_vscode/584402666701bf83c9c2e932dede8c12/execroot/rules_python_demo/bazel-out/k8-fastbuild/bin/reproduce_bug/main.runfiles/pypi/pypi__ciso8601/ciso8601.cpython-39-x86_64-linux-gnu.so", {st_mode=S_IFREG|0755, st_size=21280, ...}) = 0
trace_output.9101:openat(AT_FDCWD, "/home/vscode/.cache/bazel/_bazel_vscode/584402666701bf83c9c2e932dede8c12/execroot/rules_python_demo/bazel-out/k8-fastbuild/bin/reproduce_bug/main.runfiles/pypi/pypi__ciso8601/ciso8601.cpython-39-x86_64-linux-gnu.so", O_RDONLY|O_CLOEXEC) = 3
...
The last two lines of the snippet show the python process finding and opening the .so
file, as we want.
So only some files should be considered as relevant to determining whether a directory is non-empty from the namespace packaging perspective.
We could have a blacklist and say .pyi
files and py.typed
files should be ignored.
Or we could have a whitelist and only certain file extensions are included, like {py,pyc,so,pyd}
but I think Windows uses dll
as well.
At the moment I'd lean towards whitelist because a blacklist will be more likely to keep throwing up edge cases that require expanding the blacklist.
Began fixing this here: https://github.com/bazelbuild/rules_python/tree/jonathon--issue-381-may2021
Fixed on https://github.com/thundergolfer-playground/rules_python-issue-381 with latest commit on ☝️ above branch:
rules_python_version = "c361e789a246c81649dc8d9700d29b8a32a3d00f" # jonathon--issue-381-may2021
http_archive(
name = "rules_python",
url = "https://github.com/bazelbuild/rules_python/archive/{version}.tar.gz".format(version = rules_python_version),
strip_prefix = "rules_python-{version}".format(version = rules_python_version),
sha256 = "a6f339763303be086c7c51a38bc214c020d15709b0b82e4b1bedcf4a31e9bd6a",
)
Will do a bit more acceptance testing before throwing up the PR, but this is close to resolved I think (finally).
When trying to use the pip package ciso8601, I found that my project saw the module as empty. On further inspection, I found that this package had provided an
__init__.pyi
file (see https://github.com/closeio/ciso8601/tree/master/ciso8601), but no__init__.py
. I'm not very familiar with how the module system treats__init__.pyi
files, but it seems to work fine in isolation.However, an
__init__.py
provided by add_pkgutil_style_namespace_pkg_init was taking precedence.When I removed the
__init__.py
, my project could see the ciso8601 module as intended. The problem also goes away whenenable_implicit_namespace_pkgs = True
is provided topip_install
, but this may not generalize well to other packages (nothing has broken yet, but I can only presume the functionality is there for a reason).The issue, at least for this purpose, is that
add_pkgutil_style_namespace_pkg_init
is only looking for a__init__.py
. My suggestion is to also check for an__init__.pyi
before creating the new__init__.py
.