aspect-build / rules_py

More compatible Bazel rules for running Python tools and building Python projects
Apache License 2.0
85 stars 25 forks source link

[Bug]: Possible race condition in py_binary bash wrapper #373

Open liningpan opened 3 months ago

liningpan commented 3 months ago

What happened?

When calling the same py_binary target in parallel, the bash wrapper is racy when setting up the virtual environment. For example, we want to run a py_binary target in parallel as runfiles.

Both the old pure bash wrapper and the new one using rattler are removing the existing environment and creating a new one. This step is fundamentally unsafe and we will run into time-of-check to time-of-use issues everywhere.

We need the synchronization mechanism here (fd-lock) as well as a way to ensure the virtual environment is only created once.

Version

Development (host) and target OS/architectures: Linux Centos 7 x86

Output of bazel --version: 6.5.0

Version of the Aspect rules, or other relevant rules from your WORKSPACE or MODULE.bazel file: 0.6.0 (last version compatible w/ centos 7); I'm pretty sure this will affect the latest version as well.

Language(s) and/or frameworks involved: Python 3.11

How to reproduce

Call `rules_py` `py_binary` target in parallel.

Any other information?

No response

liningpan commented 2 months ago

@alexeagle Friendly ping here. Is this a known issue and if there would ever be a fix? I would if the best solution is to switch back to rules_python.

mattem commented 2 months ago

if there would ever be a fix?

Currently my time for OSS support is extremely limited, and generally done in some "free" time, so it may be a while before I am able to spend focused time on this.

If you'd like to discuss a design to move forwards on a fix and submit a PR, I'd be more than happy to have that conversation and work with you through the review. You can also consider sponsoring a bug / feature request via OpenCollective.

liningpan commented 2 months ago

Hi Matt! Thanks for the response. I'm happy to work on a PR. When I say "if there would ever be a fix", I meant that given the requirement of setting up a virtual environment and updating the virtual environment on each Bazel invocation, it seems like something fundamental to how rules_py is designed to work.

One possible solution I can think of is adding fd-lock and checksum the pth file.

alexeagle commented 1 month ago

Is it possible that we just don't re-use the same output folder? Could we trivially add a numbered parent folder in the path, similar to Bazel's sandbox? Maybe use the PID of the process to avoid collisions? Or would that result in eventual resource leak where the disk fills up?

Or, could we easily read an environment variable like VENV_LOCATION and use that as the parent folder for rattler to layout the venv? Then it's up to users to avoid having this sort of collisions.

voxeljorge commented 1 week ago

We have run into this one actually, but a generally ok solution seems to be to just run the program once to force the venv generation.

An alternative here might be to just expose a target that builds the venv the app will use without actually running the app. Users can then be responsible for the preflight.

Separating multiple output directories seems excessive given they would all be the same exact content, and would only be written once each no?