Avoid merge conflicts in MODULE.bazel.lock

psalaberria002 commented 4 days ago

The digest of the requirements.txt file seems to be added to the lock file, which triggers merge conflicts.

    "@@rules_python+//python/extensions:pip.bzl%pip": {
      "general": {
        "bzlTransitiveDigest": "Iwo5aX1NCLf2xFr7Cq4NNxpYMRA7qIh3/uaa/Kk4myg=",
        "usagesDigest": "A5I0AW9kWKM1sMWDOt7FRtMoXFL0LONjpXqb43RHBrw=",
        "recordedFileInputs": {
          "@@grpc+//requirements.bazel.txt": "95a27c3f9a46b8114d464c70ba93cda18cfe8c02004db81028f9306b2691701e",
          "@@//requirements.txt": "29c6501451ce48549f1dac1b73e0cd42ee924a3b7649ed0080254de4bbcebb7f"
        },

Is this necessary?

aignas commented 4 days ago

Previous analysis/comment on why we need to have the extension as non reproducible and include all of the contents in the main lock file. Please see the comment below for just looking at the requirements.txt hash in the lock file.

If at least one module that the root module is pulling in as dependencies (transitive or direct) is using pip.parse with experimental_index_url flag then we cannot mark the extension as reproducible, because we are calling to the internet to get the URLs of the packages to download using the bazel-downloader. This is a limitation of bzlmod that I am not sure we can get around.
@fmeum's suggestion that any information in the lockfiles that already exist on the file system elsewhere is I think a good suggestion, but in rules_python case the information in the lock file is incomplete if we want to use bazel-downloader. We could call the PyPI index from each whl_library repository rule to avoid doing that in the module extension context, but from my understanding, bzlmod extensions were actually design for this usecase - look up the artifacts in some registries and construct the spoke repos.

In terms of what rules_python can do, we have the following options:

Create a separate extension that always writes things into the lock file - then users who don't depend on experimental_index_url can avoid any Python dependencies in the lockfiles. However, that does not really solve the problem, but limits the space where the problem manifests itself. See #2278 for reasons for not going with that solution.
Move the fetching of the PyPI index data elsewhere and pass the results around as labels. This is only doable from bazel 7.4 onwards because only then we can construct valid labels of the spoke repos within the extension evaluation scope. However, this somewhat complicates the design of the PyPI extension - instead of having a simple hub-spoke pattern, we need to fetch the PyPI metadata lazily and we cannot write that into the MODULE.bazel.lock file meaning that each user (not only the one that does the updates to the requirements files) needs to fetch the metadata from PyPI. I've tried this design a long time ago and it is not a great solution to the problem.
Create a rules_python lock file. This means that users would need to be able to generate a requirements.txt file on demand from the rules_python lock file format or have 2 lock files in the version control. Having 2 lock file formats is what we have today. Having a rules_python specific lock file format might be a good idea, but then rules_uv usage could become more complicated because users would need to somehow convert the generated requirements.txt file into a rules_python lockfile format. So in the end we would end up with 2 lock file formats in the version control system. Again, not a great solution.
Do not support requirements.txt - only support uv.lock, pdm.lock and poetry.lock files for cross platform builds instead of going through the problem of calling PyPI and getting the metadata ourselves. We could have an attribute for each format in the pip extension.
Ask bazel to support a way to mark specific repos created via extensions as reproducible, so that they do not need to have their parameters recorded in the lock file, but they and their dependencies are still tracked in the lock file. Right now the bazel-skylib helper provides an all-or-nothing solution and that makes it problematic in our use case.

In general I don't see good alternatives here except for changes in bazel or dropping support for requirements.txt as the lockfile format that we support cross platform builds for.

@fmeum are there any plans to make the bzlmod extensions being able to control what goes into the lock file?

EDIT: Looking that this proposal has been already implemented, it does not seem that there is any other option. ~than to split the extension into 2~

EDIT2: updated based on Richard's comment.

aignas commented 4 days ago

Just looking at the requirments.txt lock file hash in the MODULE.bazel.lock file - right now it is triggered by us reading the lock file with mctx.path/read. If the requirements.txt file changes, then bzlmod knows that it needs to re-evaluate the extension.

I am not sure if there is anything better we can do - @fmeum, what would be the proposed behaviour here? If we pass things as labels in the rule, are we OK to pass watch = False when reading the files?

rickeylev commented 4 days ago

EDIT: Looking that this proposal has been already implemented, it does not seem that there is any other option than to split the extension into 2.

but earlier in your post you say that having a second extension just reduces the problem, not avoids it?

aignas commented 4 days ago

FYI, just did a quick experiment to see if mctx.read(watch = 'no') would remove the requirements files hashes and it seems that no:

$ diff MODULE.bazel.lock{before,}
162c162
<         "bzlTransitiveDigest": "vHJJUty2FcJdIyZ/BT+BKmemuqJ4FOvi9k8DzEIbpQU=",
---
>         "bzlTransitiveDigest": "45+MBqepr0RThPGA9Ls9A3aC2xq8tQMQYvXNS4+yn7g=",
aignas@panda ~/src/github/aignas/rules_python exp/lock
$ gd
diff --git a/python/private/pypi/parse_requirements.bzl b/python/private/pypi/parse_requirements.bzl
index 133ed18d..bb4fe659 100644
--- a/python/private/pypi/parse_requirements.bzl
+++ b/python/private/pypi/parse_requirements.bzl
@@ -93,7 +93,7 @@ def parse_requirements(
     for file, plats in requirements_by_platform.items():
         if logger:
             logger.debug(lambda: "Using {} for {}".format(file, plats))
-        contents = ctx.read(file)
+        contents = ctx.read(file, watch = 'no')

         # Parse the requirements file directly in starlark to get the information
         # needed for the whl_library declarations later.

So it suggests me that bazel itself is adding hashes based on the input to the tag classes.

bazelbuild / rules_python

Avoid merge conflicts in MODULE.bazel.lock #2434