Closed dgrnbrg closed 2 years ago
Hello, I have dug into the code more, and I see that behind the scenes, this system produces the the exact @$PIP_IMPORT_NAME//:requirements.bzl
that I'd like to commit to my outer repository. Currently, I can only seem to find this file by manually trawling through the bazel cache, since I'm not aware of any way to access artifacts from repository workspaces.
How should this be handled?
Two more options I can think of:
Lock a version number in your requirements.txt using ==
: https://pip.readthedocs.io/en/1.1/requirements.html#freezing-requirements This has the benefit of locking you to a specific version of a package but not the resolved whl, which may differ across host systems.
Alternatively, there's the concept of the resolved files: https://blog.bazel.build/2018/09/28/first-class-resolved-file.html and https://blog.bazel.build/2018/07/09/bazel-sync-and-resolved-file.html This will pin the result of your resolved workspace rules and optionally lock you to exact source tree hash.
I just tried using the resolved files, which seem like what my company is looking for. However, when I run bazel sync
, bazel seems to get into a bad state where it can't resolve dependencies, and it won't build anything again until I do bazel clean --expunge
. The sync seems to fail with this error:
Collecting matplotlib==3.0.2 (from -r /home/dgrnbrg/3rdparty/py_research_requirements.txt (line 8))
( Could not find a version that satisfies the requirement matplotlib==3.0.2 (from -r /home/dgrnbrg/3rdparty/py_research_requirements.txt (line 8)) (from versions: 0.86, 0.86.1, 0.86.2, 0.91.0, 0.91.1, 1.0.1, 1.1.0, 1.1.1, 1.2.0, 1.2.1, 1.3.0, 1.3.1, 1.4.0, 1.4.1rc1, 1.4.1, 1.4.2, 1.4.3, 1.5.0, 1.5.1, 1.5.2, 1.5.3, 2.0.0b1, 2.0.0b2, 2.0.0b3, 2.0.0b4, 2.0.0rc1, 2.0.0rc2, 2.0.0, 2.0.1, 2.0.2, 2.1.0rc1, 2.1.0, 2.1.1, 2.1.2, 2.2.0rc1, 2.2.0, 2.2.2, 2.2.3, 2.2.4)
Of course, matplotlib version 3.0.2 exists in real life, so I'm a bit confused. I think this might be related to the fact that I need https://github.com/darrengarvey/rules_python in order to have python 3 compatibility, which is critical for our use cases.
Do you have any suggestions or ideas on how to proceed?
How are you running build and sync (i.e what args and env vars for each) and what's in your .bazelrc? IIRC, matplotlib 3.0.0 is not supported by python 2 and it's possible that when running sync
it's falling back to using python 2, hence why you aren't seeing any of the 3.0.x versions.
Yes, I think that's the case. I use the afore-linked rules_python, which supports overriding the python version used for builds, so that we can have a working python 3 environment. My bazelrc includes:
build --python_path=/usr/bin/python3.6
test --python_path=/usr/bin/python3.6
run --python_path=/usr/bin/python3.6
build --action_env=BAZEL_PYTHON=/usr/bin/python3.6
test --action_env=BAZEL_PYTHON=/usr/bin/python3.6
run --action_env=BAZEL_PYTHON=/usr/bin/python3.6
I noticed that the max version of matplotlib discovered above is the max version supported by Python2, so I think you're correct in that sync is falling back to python2.
Ah, so when you're running sync
the BAZEL_PYTHON
env var is perhaps not set, so it is falling back to python
due to changes in that repo.
Eventually rules_python should have proper pip_import support for python 3, although various PRs have been proposing this for almost a year now: #158 #82.
Also, on another topic: in bazelrc, any args set for build
are inherited by test
and run
, so you shouldn't need to have them written three times.
Interesting--I'll try dropping the args set for test
and run
. I wasn't able to use the --python-path
or --action-env
settings for sync
in .bazelrc, since it appears that they're not supported in that context.
This is frustrating, since the net result of all this is that I cannot lock down my pip dependencies. (locking versions in requirements is something that we've tried to do, but it seems like sometimes deps-of-deps bump versions in a way that pip thinks is compatible but breaks our code)
I just wanted to add a comment regarding freezing the version in requirements.txt. The issue will be that you need to specify the transitive dependencies also in the requirements.txt, if not it will give the false impresion that they are frozen. Then when updating the version of your direct dependencies, you would like to update the transitive dependencies, it could be that some appear and some disapper and some change version.
I'd like to mention pipenv
and it's Pipfile
w/ Pipfile.lock
which I've been using extensively to great effect in my Python projects. This solves the problems with requirements.txt
not distinguishing between transitive and top-level dependencies.
I haven't yet tried to evaluate pipenv or even Pipfile support in Bazel, however many of the problems you're describing have good solutions with a Pipfile, which can specify python version, seperate dependencies and dev dependencies, and has a known lock format.
Perhaps language-specific lock files aren't ideal in bazel and I'd understand that. But worth mentioning I think.
Edit, found: https://github.com/bazelbuild/rules_python/issues/72 https://github.com/bazelbuild/rules_python/issues/171
Another lightweight reliable way to "resolve" a top-level requirements.in
file to a transitively closed requirements.txt
file is to use pip-tools compile
I wrote a little glue to make a pip-compile workflow easy-ish in a bazel workspace:
https://github.com/whilp/world/blob/b4b01e019b0dc1888cc3c81e9ab5a242e9820717/requirements.in#L1 https://github.com/whilp/world/blob/b4b01e019b0dc1888cc3c81e9ab5a242e9820717/requirements.txt#L1 https://github.com/whilp/world/blob/b4b01e019b0dc1888cc3c81e9ab5a242e9820717/requirements/compile.py#L8
Here, I add deps (pinned to versions) in requirements.in
, then run bazel requirements:compile
as needed to update requirements.txt
.
This kinda works (at least, doesn't break) with automated bumps from renovatebot:
https://github.com/whilp/world/pull/434
(But I haven't gotten those bumps to update requirements.txt as well, as they should.)
@willstott101 we originally used pipenv
to produce our transitively closed and pinned requirements.txt
file, but had problems and then moved to using pip-tools compile
, as @groodt mentions above.
The problem with pipenv
was that we wanted to be able to 're-lock' a lock-file deterministically to check platform-consistency (OSX, Linux) and validate that one of our repo users hadn't made a mistake when changing deps, but unfortunately pipenv
couldn't deterministically re-lock.
These pipenv
issues seem directly related to the issues we experienced:
pip-tools compile
has for us proved better at consistently producing transitively-closed lock files.
@whilp thanks for post your solution. @alexeagle at Robinhood has a small Starlark-based pip-tools compile
integration which I've seen and is much nicer than my company's bash script. He's planning to commit it here.
This issue has been automatically marked as stale because it has not had any activity for 180 days. It will be closed if no further activity occurs in 30 days. Collaborators can add an assignee to keep this open indefinitely. Thanks for your contributions to rules_python!
Still relevant
For those interested in this issue, could you check out https://github.com/bazelbuild/rules_python/blob/ef4d735216a3782b7c33543d82b891fe3a86e3f3/python/pip_install/requirements.bzl#L6
and provide feedback on whether it adequately addresses the issue? It's not Bazel native, but it does easily produce a 'compiled' transitively closed list of dependencies with hashes of the package files.
I'm curious about compile_pip_requirements
@thundergolfer: how does it ensure that 2 people running the rule will end up with the same packages (in particular when dealing with transitive dependencies that don't specify the version completely) ?
2 people running the rule
The pip-compile program is definitely vulnerable to differences in the machine that runs the logic (eg. OS, python version). In practice we have a CI check running to ensure there's agreement between local and CI and work to 'lock down' the development environment such that it's the same across machines.
in particular when dealing with transitive dependencies that don't specify the version completely
I believe pip-tools
looks at the existing transitively locked requirements.txt
file and will use the versions specified within if they are compatible. It does not rely on deps to specify things strictly with ==
. The resolver will arrive at the same transitive set.
This issue has been automatically marked as stale because it has not had any activity for 180 days. It will be closed if no further activity occurs in 30 days. Collaborators can add an assignee to keep this open indefinitely. Thanks for your contributions to rules_python!
This issue was automatically closed because it went 30 days without a reply since it was labeled "Can Close?"
Hello, it seems that every time I have a clean build, these rules use pip to install the requirements.txt again. Because of this, it's possible for builds to get different dependency versions over time. Is there a way to check in the output of the pip import, so that I can ensure my build is completely repeatable? Ideally, this would support a model like https://github.com/johnynek/bazel-deps, in which I can run some command on my requirements.txt to generate .bzl files and other files (I'd be happy to check in the binaries or check in a virtual BUILD hierarchy that validates sha256sums of the deps). This repeatability is a key feature of bazel, and a worry for me using rules_python.
Is there a path to accomplish this? Would you accept a PR to add this?