bazelbuild / bazel

a fast, scalable, multi-language and extensible build system
https://bazel.build
Apache License 2.0
22.98k stars 4.03k forks source link

Downloader config rewrite rules should be able to use specified hashes #23502

Open gholms opened 2 weeks ago

gholms commented 2 weeks ago

Description of the feature request:

Frustrating as it may be, some upstreams don't use version-based file names for their archives and simply update them in place, changing their hashes and breaking existing builds, so we mirror third-party dependencies using content-based paths, such as third-party/<sha256>/bazel-skylib-1.7.1.tar.gz. We don't use a simple read-through cache that can go fetch new things from the internet because we want to actively break builds which change dependencies so we can review and document those changes.

At present a downloader config file can rewrite URLs using pattern matching, but only some of the information which one has passed to module_ctx.download is actually present in the URL. It is impossible to construct a hash-based path using a rewrite rule because the original URL rarely contains this hash. We would like to be able to request a specific file hash (sha256, etc.) and encoding (hex, base64) from Bazel in a rewrite rule so this is possible to do seamlessly.

Clearly, this will result in failures in cases where the hash the user asks for isn't the one we've been given in the repository rule or module. However, that is a self-inflicted problem that the mirror owner and the downloader config writer are on the hook for resolving themselves.

Which category does this issue belong to?

External Dependency

What underlying problem are you trying to solve with this feature?

In our present, pre-bzlmod-based universe, we use a dedicated macro for third-party archives that takes a SHA256 hash and the original file name and constructs the appropriate mirror URL out of them. Continuing this practice with bzlmod would mean having to host a modified copy of the entire central registry because there is no other way to do that without essentially undoing all of the convenient dependency-resolving benefits that system is supposed to provide. Making all of the information the download function has available to it (or at least the hashes) available to rewrite rules, on the other hand, would make using stock BCR fairly painless because we can handle everything with rewrite rules. All we would have to do is handle mirroring when dependencies change, and our internal registry could then be limited to the things that aren't on BCR.

Which operating system are you running Bazel on?

Ubuntu 20.04

What is the output of bazel info release?

release 7.3.1

If bazel info release returns development version or (@non-git), tell us how you built Bazel.

No response

What's the output of git remote get-url origin; git rev-parse HEAD ?

No response

Have you found anything relevant by searching the web?

No response

Any other information, logs, or outputs that you want to share?

Our Bazel Slack conversation, in case you're curious: https://bazelbuild.slack.com/archives/C014RARENH0/p1725058457719319

meteorcloudy commented 1 week ago

A PR will be very welcome for this feature!