bazelbuild / rules_python

Bazel Python Rules
https://rules-python.readthedocs.io
Apache License 2.0
534 stars 541 forks source link

Ability to lock down dependencies for reproducibility? #140

Closed dgrnbrg closed 2 years ago

dgrnbrg commented 5 years ago

Hello, it seems that every time I have a clean build, these rules use pip to install the requirements.txt again. Because of this, it's possible for builds to get different dependency versions over time. Is there a way to check in the output of the pip import, so that I can ensure my build is completely repeatable? Ideally, this would support a model like https://github.com/johnynek/bazel-deps, in which I can run some command on my requirements.txt to generate .bzl files and other files (I'd be happy to check in the binaries or check in a virtual BUILD hierarchy that validates sha256sums of the deps). This repeatability is a key feature of bazel, and a worry for me using rules_python.

Is there a path to accomplish this? Would you accept a PR to add this?

dgrnbrg commented 5 years ago

Hello, I have dug into the code more, and I see that behind the scenes, this system produces the the exact @$PIP_IMPORT_NAME//:requirements.bzl that I'd like to commit to my outer repository. Currently, I can only seem to find this file by manually trawling through the bazel cache, since I'm not aware of any way to access artifacts from repository workspaces.

How should this be handled?

aaliddell commented 5 years ago

Two more options I can think of:

dgrnbrg commented 5 years ago

I just tried using the resolved files, which seem like what my company is looking for. However, when I run bazel sync, bazel seems to get into a bad state where it can't resolve dependencies, and it won't build anything again until I do bazel clean --expunge. The sync seems to fail with this error:

Collecting matplotlib==3.0.2 (from -r /home/dgrnbrg/3rdparty/py_research_requirements.txt (line 8))
 (  Could not find a version that satisfies the requirement matplotlib==3.0.2 (from -r /home/dgrnbrg/3rdparty/py_research_requirements.txt (line 8)) (from versions: 0.86, 0.86.1, 0.86.2, 0.91.0, 0.91.1, 1.0.1, 1.1.0, 1.1.1, 1.2.0, 1.2.1, 1.3.0, 1.3.1, 1.4.0, 1.4.1rc1, 1.4.1, 1.4.2, 1.4.3, 1.5.0, 1.5.1, 1.5.2, 1.5.3, 2.0.0b1, 2.0.0b2, 2.0.0b3, 2.0.0b4, 2.0.0rc1, 2.0.0rc2, 2.0.0, 2.0.1, 2.0.2, 2.1.0rc1, 2.1.0, 2.1.1, 2.1.2, 2.2.0rc1, 2.2.0, 2.2.2, 2.2.3, 2.2.4)

Of course, matplotlib version 3.0.2 exists in real life, so I'm a bit confused. I think this might be related to the fact that I need https://github.com/darrengarvey/rules_python in order to have python 3 compatibility, which is critical for our use cases.

Do you have any suggestions or ideas on how to proceed?

aaliddell commented 5 years ago

How are you running build and sync (i.e what args and env vars for each) and what's in your .bazelrc? IIRC, matplotlib 3.0.0 is not supported by python 2 and it's possible that when running sync it's falling back to using python 2, hence why you aren't seeing any of the 3.0.x versions.

dgrnbrg commented 5 years ago

Yes, I think that's the case. I use the afore-linked rules_python, which supports overriding the python version used for builds, so that we can have a working python 3 environment. My bazelrc includes:

build --python_path=/usr/bin/python3.6
test --python_path=/usr/bin/python3.6
run --python_path=/usr/bin/python3.6

build --action_env=BAZEL_PYTHON=/usr/bin/python3.6
test --action_env=BAZEL_PYTHON=/usr/bin/python3.6
run --action_env=BAZEL_PYTHON=/usr/bin/python3.6

I noticed that the max version of matplotlib discovered above is the max version supported by Python2, so I think you're correct in that sync is falling back to python2.

aaliddell commented 5 years ago

Ah, so when you're running sync the BAZEL_PYTHON env var is perhaps not set, so it is falling back to python due to changes in that repo.

Eventually rules_python should have proper pip_import support for python 3, although various PRs have been proposing this for almost a year now: #158 #82.

Also, on another topic: in bazelrc, any args set for build are inherited by test and run, so you shouldn't need to have them written three times.

dgrnbrg commented 5 years ago

Interesting--I'll try dropping the args set for test and run. I wasn't able to use the --python-path or --action-env settings for sync in .bazelrc, since it appears that they're not supported in that context.

This is frustrating, since the net result of all this is that I cannot lock down my pip dependencies. (locking versions in requirements is something that we've tried to do, but it seems like sometimes deps-of-deps bump versions in a way that pip thinks is compatible but breaks our code)

limdor commented 5 years ago

I just wanted to add a comment regarding freezing the version in requirements.txt. The issue will be that you need to specify the transitive dependencies also in the requirements.txt, if not it will give the false impresion that they are frozen. Then when updating the version of your direct dependencies, you would like to update the transitive dependencies, it could be that some appear and some disapper and some change version.

willstott101 commented 4 years ago

I'd like to mention pipenv and it's Pipfile w/ Pipfile.lock which I've been using extensively to great effect in my Python projects. This solves the problems with requirements.txt not distinguishing between transitive and top-level dependencies.

I haven't yet tried to evaluate pipenv or even Pipfile support in Bazel, however many of the problems you're describing have good solutions with a Pipfile, which can specify python version, seperate dependencies and dev dependencies, and has a known lock format.

Perhaps language-specific lock files aren't ideal in bazel and I'd understand that. But worth mentioning I think.

Edit, found: https://github.com/bazelbuild/rules_python/issues/72 https://github.com/bazelbuild/rules_python/issues/171

groodt commented 4 years ago

Another lightweight reliable way to "resolve" a top-level requirements.in file to a transitively closed requirements.txt file is to use pip-tools compile

whilp commented 4 years ago

I wrote a little glue to make a pip-compile workflow easy-ish in a bazel workspace:

https://github.com/whilp/world/blob/b4b01e019b0dc1888cc3c81e9ab5a242e9820717/requirements.in#L1 https://github.com/whilp/world/blob/b4b01e019b0dc1888cc3c81e9ab5a242e9820717/requirements.txt#L1 https://github.com/whilp/world/blob/b4b01e019b0dc1888cc3c81e9ab5a242e9820717/requirements/compile.py#L8

Here, I add deps (pinned to versions) in requirements.in, then run bazel requirements:compile as needed to update requirements.txt.

This kinda works (at least, doesn't break) with automated bumps from renovatebot:

https://github.com/whilp/world/pull/434

(But I haven't gotten those bumps to update requirements.txt as well, as they should.)

thundergolfer commented 4 years ago

@willstott101 we originally used pipenv to produce our transitively closed and pinned requirements.txt file, but had problems and then moved to using pip-tools compile, as @groodt mentions above.

The problem with pipenv was that we wanted to be able to 're-lock' a lock-file deterministically to check platform-consistency (OSX, Linux) and validate that one of our repo users hadn't made a mistake when changing deps, but unfortunately pipenv couldn't deterministically re-lock.

These pipenv issues seem directly related to the issues we experienced:

pip-tools compile has for us proved better at consistently producing transitively-closed lock files.


@whilp thanks for post your solution. @alexeagle at Robinhood has a small Starlark-based pip-tools compile integration which I've seen and is much nicer than my company's bash script. He's planning to commit it here.

github-actions[bot] commented 3 years ago

This issue has been automatically marked as stale because it has not had any activity for 180 days. It will be closed if no further activity occurs in 30 days. Collaborators can add an assignee to keep this open indefinitely. Thanks for your contributions to rules_python!

leoluk commented 3 years ago

Still relevant

thundergolfer commented 3 years ago

For those interested in this issue, could you check out https://github.com/bazelbuild/rules_python/blob/ef4d735216a3782b7c33543d82b891fe3a86e3f3/python/pip_install/requirements.bzl#L6

and provide feedback on whether it adequately addresses the issue? It's not Bazel native, but it does easily produce a 'compiled' transitively closed list of dependencies with hashes of the package files.

thekyz commented 3 years ago

I'm curious about compile_pip_requirements @thundergolfer: how does it ensure that 2 people running the rule will end up with the same packages (in particular when dealing with transitive dependencies that don't specify the version completely) ?

thundergolfer commented 3 years ago

2 people running the rule

The pip-compile program is definitely vulnerable to differences in the machine that runs the logic (eg. OS, python version). In practice we have a CI check running to ensure there's agreement between local and CI and work to 'lock down' the development environment such that it's the same across machines.

in particular when dealing with transitive dependencies that don't specify the version completely

I believe pip-tools looks at the existing transitively locked requirements.txt file and will use the versions specified within if they are compatible. It does not rely on deps to specify things strictly with ==. The resolver will arrive at the same transitive set.

github-actions[bot] commented 2 years ago

This issue has been automatically marked as stale because it has not had any activity for 180 days. It will be closed if no further activity occurs in 30 days. Collaborators can add an assignee to keep this open indefinitely. Thanks for your contributions to rules_python!

github-actions[bot] commented 2 years ago

This issue was automatically closed because it went 30 days without a reply since it was labeled "Can Close?"