Open aaronmondal opened 4 months ago
+1 it's really painful to have to encode new files as patches
We could add a remote_files
attribute to http_archive
that takes a dict from repo-relative paths to URLs and downloads the files from these URLs to the given path (for simplicity, assuming that the original file didn't exist). We would need an additional remote_files_integrity
dict to store the hashes.
Then registries could start populating this attribute in their source.json
files, but they would only be compatible with Bazel versions with support for the new attributes. I need to check what an old Bazel version would do with such a source.json
- hopefully it would fail rather than silently dropping the attributes. But we could always have a bazel_compatibility
requirement for such modules.
@meteorcloudy @Wyverald @SalmaSamy What do you think?
SGTM, PR welcome. And we could backport the change to 6.x and 7.x.
I am interested in working on this. I think it would be a big QoL improvement.
@fmeum if could point me to some source files or maybe a PR that did something similar in changes to sources.json to mimic I can try that.
I want to add not only is it annoying to submit patch files but the BUILD files aren't searchable on GitHub search using the starlark language filter. That would help expose more samples.
Okay here is my research:
Looks like (1) can be starlark only. (2) and (3) need to be changed for bzlmod but I'm not 100% certain of their innerworkings.
@fzakaria That's highly appreciated! Your research looks solid, most of this can indeed be done in Starlark and I can also help with the Java parts.
For starters, I would suggest to focus on the Starlark part and augment http_archive
with two new attributes:
remote_file_urls
of type string_list_dict
, mapping a relative path of a file to its source URLs, thus supporting mirrors. Remote patches currently don't support mirrors. @meteorcloudy Was that intentional?remote_file_integrity
of type string_dict
, mapping the same relative path to the integrity value.I think that attribute structure is going to be simpler than a full-blown overlay JSON, with a very similar experience: In the BCR, we could just have a files
sibling directory of patches
with files neatly laid out according to their relative path. The usual tooling (//tools:update_integrity
in the BCR) could then add the integrity values to source.json
in whatever format we want.
When implementing the download logic for the remote files, I would suggest to keep the following in mind:
block
parameter of download. This can't be cherry-picked into Bazel 6 though...
segments and also verify that the absolute path obtained by resolving the relative one against the repository directory still lies under the repository directory. As far as I know download_and_extract
can't create symlinks to absolute paths that are directories, but we should also doublecheck that this doesn't allow malicious archives to place files anywhere on the system.@fmeum thank you for the advice -- I was able to start https://github.com/bazelbuild/bazel/pull/22155 It is working. I finished 90% -- I can add remaining tests and finish it but I wanted some early feedback.
@fmeum sent me next steps I am capturing here:
You will have to extend this file: https://cs.opensource.google/bazel/bazel/+/master:src/main/java/com/google/devtools/build/lib/bazel/bzlmod/IndexRegistry.java;l=131;drc=a87a8db23885535232ace3f64ac45e595c95d2f5
In particular the getRepoSpec method. You can mostly mimic the logic for patches, I think.
Once that lands, we can update the tooling in the bazel-central-registry repo.
While @fzakaria is close to getting overlay support into Bazel, I noticed something that could make our lives easier today: It's possible to reuse patches from other module versions, which could be used to simplify reviews of C++ modules. See https://github.com/bazelbuild/bazel-central-registry/pull/2024/files#diff-90fa618704d4146385f96570f2033f70b09f59ae5c630825ea0dde357520ebd5R5 for an example.
Final piece is up for PR #2046 2046
At the moment all modules are overlaid with patches and those patches are hashed and tracked in
source.json
.Many modules probably just need a
MODULE.bazel
file and some buildfiles. I think it would be nice UX improvement if those files could be put into the BCR directly and have some mechanism to "automatically" overlay these Bazel-specific files.Since we already have
MODULE.bazel
in the BCR without patches, it might be viable to have some sort of overlay mechanism, maybe even anoverlays
directory that adds Starlark files directly on top of the external repository.Such an approach would be somewhat similar to how it feels like to work in
nixpkgs
: Have the build overlays as regular source code and only use patches for things that are "true" functionality-changing patches.I believe this could make it nicer to browse the BCR and get inspiration. It would also make it easier to spot functional changes and would potentially reduce the friction of having to generate patch files all the time.