Open qingyouzhao opened 2 months ago
This bug was mentioned in a recent PR, so I'll repost what I said there.
The basic gist is that pyc files have been problematic for build determinism and correctness.
Usually pyc files are created by the runtime when they are first imported. This is problematic for Bazel in two ways:
foo.pyc.$TIMESTAMP
file is created, then atomically moved to its final location). While that is happening, another Python process, or Bazel process can be reading the file. On windows, this results in an error because an open file is deletedAll that said, this behavior is somewhat surprising and perplexing. By default, individual files are symlinked, and they're in a sandbox, so it's not clear how pycs were getting created in the underlying repo directory. But, all the CI issues went away after ignoring pyc files.
@rickeylev Thanks for the explanation. I might not fully grasp the details but I get your point on
What I am curious is:
*.pyc.*
to handle foo.pyc.$TIMESTAMP
cases while keeping foo.pyc
included?Can bazel differentiate runtime created pyc files vs package installed pyc files?
Not really. Both cases are "source" files. So to Bazel, whether it was installed from the package, created by a user, or created by some other process, they look the same.
The closest we're able to do is prevent the interpreter from being able to write the files, e.g. by making everything read-only. Windows strikes here again, though, because it's security model allows an admin user to ignore such read-only attributes.
Alternatively, we could force the interpreter to use DONTWRITEBYTECODE. That might be doable (setting interpreter flags/env has been trickier than expected; this is the sort of thing we have to try to flush out edge cases).
Can we only ignore .pyc. to handle foo.pyc.$TIMESTAMP cases while keeping foo.pyc included?
Yes, but it doesn't prevent the deleting-open-file race condition I mentioned. This only affects windows (only windows gives an error about deleting an open file). It'd be acceptable to have windows ignore pyc, I suppose; our windows support is borderline and best-effort. The main thing is preventing the build errors that occur, where the only thing you can do is re-run bazel and hope things interweave successfully.
Also, to clarify:
All that said, this behavior is somewhat surprising and perplexing. By default, individual files are symlinked, and they're in a sandbox, so it's not clear how pycs were getting created in the underlying repo directory.
Part of me suspects this undesirable behavior is limited to (a) just the interpreter and its stdlib, and/or (b) local, non-sandboxed execution of some sort. That's just a theory -- if it could be shown that theory is correct, maybe we could think of some more solutions.
Two more thoughts.
Precompiling is a builtin feature of the rules now. By setting precompile="enabled"
on the targets, it'll make Bazel perform its own precopmilation of py sources. We should change the generated pypi code to enable precompiling; we just haven't gotten aroudn to trying that yet.
If the pypi package is a pyc-only package (rare, but supported), then I think it's reasonble for py_library to accept .pyc
files directly in srcs. A pyc-only library wouldn't have any of these issues, as there isn't any py source file to try and generate a new pyc file from.
The main thing is preventing the build errors that occur
In my use case, I am okay with the build errors because the package that I am concerned with is a pyc-only library with pyi files for tooling. I am happy with any special case solution that just works with pure pyc packages.
I think this issue could be something that outside contributors could come and help out - the whl_library_targets
macro is generating the py_library
targets that will be consumed by the downstream. There are a few options here:
whl_mods
based solution, which modifies the included/excluded srcs or something similar.RECORD
file and then include that as part of the py_library
. I think I would prefer the second approach - to include the .pyc
files if they are in the RECORD
file because that means that they are a part of the whl
distribution.Both of the solutions should be first gated by an env
variable or a feature flag so that we can roll it out slowly.
š bug report
Affected Rule
https://github.com/bazelbuild/rules_python/blob/main/docs/pypi-dependencies.md Specifically
py_library
withname = "pkg"
generated byIs this a regression?
I don't know.
Description
I couldn't import a module from pip package that ships python modules with .pyi and .pyc files. For example, I have a example_package that ships with the following file structure
According to https://peps.python.org/pep-3147/#flow-chart,
Should be successful. But the
py_library
generated specifically excludes"**/*.pyc"
fromdata
. If I manually remove the"**/*.pyc"
exclusion, the import statement works.How
š¬ Minimal Reproduction
import foo from example_package
should work but errors out with module not found error.š„ Exception or Error
š Your Environment
Operating System:
Output of
bazel version
:Rules_python version:
Anything else relevant?