bazel-contrib / SIG-rules-authors

Governance and admin for the rules authors Special Interest Group
https://bazel-contrib.github.io/SIG-rules-authors/
Apache License 2.0
30 stars 12 forks source link

Should rulesets distribute a pre-built artifact rather than rely on GitHub source/release archive #11

Closed alexeagle closed 2 years ago

alexeagle commented 3 years ago

Rules ought to distribute an artifact that doesn't contain references to development-time dependencies, and omits testing code and examples.

This means the distribution can be broken if files are missing.

In addition, rules ought to integration-test against all supported bazel versions. So there should be some bazel-in-bazel test that consumes the HEAD distribution artifact and tests that the examples work.

Right now there are a few ways. rules_nodejs and rules_python have a built-in integration test runner. rules_go has a special go_bazel_test rule.

alexeagle commented 3 years ago

See https://docs.google.com/document/d/1s_8AihGXbYujNWU_VjKNYKQb8_QGGGq7iKwAVQgjbn0/edit?usp=sharing for discussion around the requirements for testing against multiple Bazel versions.

aherrmann commented 3 years ago

Rules ought to distribute an artifact that doesn't contain references to development-time dependencies, and omits testing code and examples.

Could you motivate this? It is not clear to me why this should be mandated.

If the motivation is that users of a rule set should not depend on dev-dependencies of that rule set, then this can be achieved without a dedicated distribution artifact. E.g. in rules_haskell dev-dependencies are only pulled in in rules_haskell's own WORKSPACE file, while regular dependencies are pulled in by the rules_haskell_dependencies macro that users are meant to call as well. Also the upcoming Bazel modules mechanism has a notion of dev-dependencies IIRC.

I think it is a plus that Bazel rule sets can be imported directly from their source repository at any commit without needing to generate a distribution artifact first. This makes it very easy to pull in a non-release commit of a rule set that contains a needed fix. If rule sets are only intended to be used from distribution artifacts, then this use-case is no longer necessarily supported, as a rule set may depend on generated files that are only included in the distribution artifact.

Either way, I don't think this should be mandated without the required tooling being available. See below.


Regarding bazel-in-bazel tests. I agree that this would be useful to have. We have looked into this for rules_haskell and in this context looked into a Gazelle extension to generate filegroup targets capturing all files required to run the rule set. (The same would be useful for generating distribution artifacts.)

We based our efforts on Gazelle's test_filegroup. However, we found it to be lacking for our use-case. Issues that come to mind are that it does not respect .gitignore or .bazelignore files, leading to invalid file inclusions of e.g. embedded workspaces for integration testing or user local configuration files like .bazelrc.local. Or that it assumes that every directory is a Bazel package, which is not a valid assumption and breaks labels like //my/pkg:src/some/source/file.

It would be great to have general purpose versions of test_filegroup and go_bazel_test available for any rule set to use. I'd view this as a prerequisite for this recommendation.

alexeagle commented 3 years ago

Mostly the pre-built distribution artifact is required to get a stable checksum. If you rely on GitHub's generated .tgz source archives, you get a breakage when GitHub makes OS updates on their servers that create those archives. It's also handy to avoid someone building @some_ruleset//... and breaking because there's a load statement there from a dev dependency. I agree that it's desirable that the distro artifact is same-shaped as the source repo (generally just a subset of files) so that it's easy to opt-in to a HEAD dependency. We made that mistake with rules_nodejs and are working to undo that.

ittaiz commented 3 years ago

Hi πŸ‘‹πŸ½, Few more thoughts on my end:

  1. Having bazel in bazel tests is super valuable also for internal rules authors. I've had this need a few times.
  2. Just in case someone doesn't know then there's the bazel integration testing repo. I've failed to keep it alive but I think a lot of the concepts there are valuable.
  3. +1 for being able to use commits from github. In practice we only had 1 issue with the checksums in 5-6 years and I can't count how many builds.
alexeagle commented 3 years ago

@ittaiz what do you think about the SIG contributing or owning the current integration test repo in bazelbuild org?

ittaiz commented 3 years ago

Be happy to add contributors and even hand ownership over if you feel that's important

aherrmann commented 2 years ago

Mostly the pre-built distribution artifact is required to get a stable checksum. If you rely on GitHub's generated .tgz source archives, you get a breakage when GitHub makes OS updates on their servers that create those archives.

Is this still true? I haven't found official GitHub documentation stating that the archives are reproducible, but I have found this reproducible-builds thread pointing out that Github uses git archive and that git archive is designed to be reproducible.

Just as a quick test I compared the GH archive to a git archive created locally on rules_haskell.

$ curl -L https://github.com/tweag/rules_haskell/archive/455d9e6e8212f0bb73cd6e5437b0f5ce093e44be.tar.gz|sha256sum -
6841e554566d0c326beac84442dd776c49fac7d6059fef4728e75ae37c8e92cc  -
$ git clone https://github.com/tweag/rules_haskell; cd rules_haskell; git archive --format=tar --prefix=rules_haskell-455d9e6e8212f0bb73cd6e5437b0f5ce093e44be/ 455d9e6e8212f0bb73cd6e5437b0f5ce093e44be | gzip > tarball.tgz; sha256sum tarball.tgz
6841e554566d0c326beac84442dd776c49fac7d6059fef4728e75ae37c8e92cc  tarball.tgz

As you can see, the SHA256 is identical. This suggests that the archive is generated reproducibly.

Anecdotally, the only instance where I encountered issues with a changing commit hash in the last couple years was https://github.com/kubernetes/kubernetes/issues/99376. In this case the change was due to a problematic .gitattributes configuration.

alexeagle commented 2 years ago

@aherrmann I've followed this guidance ever since Jay Conrod made a big deal out of it in rules_go and bazel_gazelle. https://github.com/bazelbuild/rules_go/issues/2340 suggests maybe some GitHub URLs are reliable and some are not?

There is yet another reason I think rules should build their own distribution archive, which is that you can calculate your own checksum to produce the WORKSPACE snippet in the release process before shipping the commits to GitHub.

aherrmann commented 2 years ago

@aherrmann I've followed this guidance ever since Jay Conrod made a big deal out of it in rules_go and bazel_gazelle. bazelbuild/rules_go#2340 suggests maybe some GitHub URLs are reliable and some are not?

Thanks for the pointer, I dug into this a little. I've attached the details in the end, in short: I don't think this was a case of the Github generated source archive changing. Instead, it looks to me as though this was a mixup between the SHA for the Github generated source archive and the release artifact. So, I don't think this is evidence to support the claim that Github source archives are non-reproducible.

There is yet another reason I think rules should build their own distribution archive, which is that you can calculate your own checksum to produce the WORKSPACE snippet in the release process before shipping the commits to GitHub.

The same can be achieved using git archive --format=tar.gz --prefix=$NAME-$TAG/ $TAG | sha256sum when using source archives.

To be clear, I'm not saying one should not use release artifacts. But, I am saying that I don't see why it should be mandated that everyone use them without a good technical reason to motivate that mandate. I haven't seen such a reason, yet. As mentioned above, there are upsides to the source archive approach and costs to the release artifact approach.


Details:

If we take a look at the changes in the PR we see

--- a/multirun/deps.bzl
+++ b/multirun/deps.bzl
@@ -4,7 +4,7 @@ def multirun_dependencies():
     _maybe(
         http_archive,
         name = "bazel_skylib",
-        sha256 = "2ef429f5d7ce7111263289644d233707dba35e39696377ebab8b0bc701f7818e",
+        sha256 = "2ea8a5ed2b448baf4a6855d3ce049c4c452a6470b1efd1504fdb7c1c134d220a",
         strip_prefix = "bazel-skylib-0.8.0",
         urls = ["https://github.com/bazelbuild/bazel-skylib/archive/0.8.0.tar.gz"],
     )

The 0.8.0 release has a release artifact and of course the generated source archive. If we look at the SHAs of each of these we find

$ curl -L https://github.com/bazelbuild/bazel-skylib/releases/download/0.8.0/bazel-skylib.0.8.0.tar.gz|sha256sum -
2ef429f5d7ce7111263289644d233707dba35e39696377ebab8b0bc701f7818e  -
$ curl -L https://github.com/bazelbuild/bazel-skylib/archive/refs/tags/0.8.0.tar.gz|sha256sum -
2ea8a5ed2b448baf4a6855d3ce049c4c452a6470b1efd1504fdb7c1c134d220a  -

I.e. the old hash was the hash of the release artifact and the new hash is the hash of the generated source archive.

If we compare the contents of these two archives we find

$ curl -L https://github.com/bazelbuild/bazel-skylib/releases/download/0.8.0/bazel-skylib.0.8.0.tar.gz|tar ztv|head -n 5
...
drwxrwxr-x root/root         0 2019-03-20 18:13 .bazelci/
-rw-rw-r-- root/root      2348 2019-03-20 18:13 .bazelci/presubmit.yml
-rw-rw-r-- root/root         9 2019-03-20 18:13 .gitignore
-rw-rw-r-- root/root       308 2019-03-20 18:13 AUTHORS
-rw-rw-r-- root/root      1002 2019-03-20 18:13 BUILD

$ curl -L https://github.com/bazelbuild/bazel-skylib/archive/refs/tags/0.8.0.tar.gz|tar ztv|head -n 5
...
drwxrwxr-x root/root         0 2019-03-20 18:13 bazel-skylib-0.8.0/
drwxrwxr-x root/root         0 2019-03-20 18:13 bazel-skylib-0.8.0/.bazelci/
-rw-rw-r-- root/root      2348 2019-03-20 18:13 bazel-skylib-0.8.0/.bazelci/presubmit.yml
-rw-rw-r-- root/root         9 2019-03-20 18:13 bazel-skylib-0.8.0/.gitignore
-rw-rw-r-- root/root       308 2019-03-20 18:13 bazel-skylib-0.8.0/AUTHORS

I.e. the release artifact has no prefix, while the generated source archive does have the standard <repo>-<rev> prefix.

The change is from Jan 2020, I'm pretty sure Github generated source archives had the <repo>-<rev> prefixes at that time as well. So, it looks like the old hash was never that of a Github generated source archive, but that of the release artifact. It seems the issue here was most likely not that the generated source archive changed, but that the wrong hash was written into multirun/deps.bzl before.

For reference, I can produce an equivalent to the Github generated source archive with the same hash on my machine today:

$ git archive --format=tar.gz --prefix=bazel-skylib-0.8.0/ 0.8.0 | sha256sum
2ea8a5ed2b448baf4a6855d3ce049c4c452a6470b1efd1504fdb7c1c134d220a  -

If I try to reproduce the release artifact I get a different hash than the release artifact uploaded on

$ git archive --format=tar.gz 0.8.0 | sha256sum
a04a79bca280f759ec2339c035e19d1f249616c38a352f9fdb8837a7c0ea2f7c  -

But, comparing this generated prefix-less tarball to the release tarball I find

$ curl -L https://github.com/bazelbuild/bazel-skylib/releases/download/0.8.0/bazel-skylib.0.8.0.tar.gz > released.tar.gz
$ git archive --format=tar.gz 0.8.0 > generated.tar.gz
$ diffoscope released.tar.gz generated.tar.gz
--- released.tar.gz
+++ generated.tar.gz
β”œβ”€β”€ filetype from file(1)
β”‚ @@ -1 +1 @@
β”‚ -gzip compressed data, last modified: Wed Mar 20 18:02:49 2019, max compression
β”‚ +gzip compressed data, from Unix
β”‚   --- released.tar
β”œβ”€β”€ +++ generated.tar
β”‚ β”œβ”€β”€ filetype from file(1)
β”‚ β”‚ @@ -1 +1 @@
β”‚ β”‚ -POSIX tar archive (GNU)
β”‚ β”‚ +POSIX tar archive

So, the difference comes down to the release artifact containing slightly different headers including a timestamp.

alexeagle commented 2 years ago

Great discussion. I think this issue ended up conflating two things. We agree that we need bazel-in-bazel integration testing of rules, let's move that to a new issue since the bulk of discussion here was about the release archive and that's just one motivation for bazel-in-bazel testing.

alexeagle commented 2 years ago

I've updated all my repos, as well as the rules-template, to reflect that GitHub produces a stable SHA for the artifacts it serves.

fmeum commented 2 years ago

Sorry to revive this closed issue, but I just encountered a situation in which the SHA of a GitHub-provided archive changed over time and thus ended up breaking the build.

Over at https://github.com/CodeIntelligenceTesting/jazzer, we use the following dependency on abseil-cpp:

    maybe(
        http_archive,
        name = "com_google_absl",
        sha256 = "5e1cbf25bf501f8e37866000a6052d02dbdd7b19a5b592251c59a4c9aa5c71ae",
        strip_prefix = "abseil-cpp-f2dbd918d8d08529800eb72f23bd2829f92104a4",
        url = "https://github.com/abseil/abseil-cpp/archive/f2dbd918d8d08529800eb72f23bd2829f92104a4.zip",
    )

An hour ago, CI runs started to fail with this error:

ERROR: /home/runner/work/jazzer/jazzer/driver/BUILD.bazel:21:11: //driver:fuzzed_data_provider depends on @com_google_absl//absl/strings:str_format in repository @com_google_absl which failed to fetch. no such package '@com_google_absl//absl/strings': java.io.IOException: Error downloading [https://github.com/abseil/abseil-cpp/archive/f2dbd918d8d08529800eb72f23bd2829f92104a4.zip] to /home/runner/.cache/bazel/_bazel_runner/6bc610921f14939de4c55cf170d55a62/external/com_google_absl/temp17765729958342005876/f2dbd918d8d08529800eb72f23bd2829f92104a4.zip: Checksum was 70203fec1c4823d4fe689f1c413bc7a0e6b4556dbd55b5ac40fc8862bacc0dcb but wanted 5e1cbf25bf501f8e37866000a6052d02dbdd7b19a5b592251c59a4c9aa5c71ae

I attached both the ZIP file that can currently be obtained from https://github.com/abseil/abseil-cpp/archive/f2dbd918d8d08529800eb72f23bd2829f92104a4.zip (abseil-cpp-f2dbd918d8d08529800eb72f23bd2829f92104a4.github-new.zip) and the ZIP file that was previously generated by GitHub and that I obtained from my local repository cache (abseil-cpp-f2dbd918d8d08529800eb72f23bd2829f92104a4.github-old.zip).

Running diffoscope on these files shows that the mtimes hour changed:

...
β”‚β”„ Archive contents identical but files differ, possibly due to different compression levels. Falling back to binary comparison.
β”œβ”€β”€ zipinfo -v {}
β”‚ @@ -28,15 +28,15 @@
β”‚    file system or operating system of origin:      MS-DOS, OS/2 or NT FAT
β”‚    version of encoding software:                   0.0
β”‚    minimum file system compatibility required:     MS-DOS, OS/2 or NT FAT
β”‚    minimum software version required to extract:   1.0
β”‚    compression method:                             none (stored)
β”‚    file security status:                           not encrypted
β”‚    extended local header:                          no
β”‚ -  file last modified on (DOS date/time):          2021 Nov 11 00:09:50
β”‚ +  file last modified on (DOS date/time):          2021 Nov 11 08:09:50
β”‚    file last modified on (UT extra field modtime): 2021 Nov 11 08:09:50 local
β”‚    file last modified on (UT extra field modtime): 2021 Nov 11 08:09:50 UTC
β”‚    32-bit CRC value (hex):                         00000000
β”‚    compressed size:                                0 bytes
β”‚    uncompressed size:                              0 bytes
β”‚    length of filename:                             52 characters
β”‚    length of extra field:                          9 bytes
...

@aherrmann Do you have an idea how this could happen and whether tar.gz would not have been prone to this?

fmeum commented 2 years ago

Looks like the change has been rolled back, so this might have been an honest bug.

brentleyjones commented 2 years ago

And they said that they would insure the checksum doesn't change in the future. So I think this might even harden the case that we can rely on the checksum.

fmeum commented 2 years ago

@brentleyjones That's great to know. Could you point me to the place where they confirmed that?

brentleyjones commented 2 years ago

So not as strong as a guarantee as I originally read it as, but it seems the rollback was related to the checksum change: https://twitter.com/tgummerer/status/1488493440103030787

fmeum commented 2 years ago

There is https://twitter.com/tgummerer/status/1488493481874055173 though, so depending on archives for individual commits is unsafe.

brentleyjones commented 2 years ago

Yikes πŸ˜•

alexeagle commented 2 years ago

I think we have to push hard and escalate (like Ulf did) to point out that GH is running a package repo and the world relies on it for supply-chain safety...

alexeagle commented 2 years ago

/cc @tgummerer

aherrmann commented 2 years ago

@aherrmann Do you have an idea how this could happen and whether tar.gz would not have been prone to this?

We've seen the same issue on some zip dependencies but not on tar.gz dependencies. It would be good to get clarification on this from GitHub as was requested here.

fmeum commented 2 years ago

After a fruitful exchange with GitHub support staff, I was able to confirm the following (quoting with their permission):

I checked with our team and they confirmed that we can expect the checksums for repository release archives, found at /archive/refs/tags/$tag, to be stable going forward. That cannot be said, however, for repository code download archives found at archive/v6.0.4.

It's totally understandable that users have come to expect a stable and consistent checksum value for these archives, which would be the case most of the time. However, it is not meant to be reliable or a way to distribute software releases and nothing in the software stack is made to try to produce consistent archives. This is no different from creating a tarball locally and trying verify it with the hash of the tarball someone created on their own machine.

If you had only a tag with no associated release, you should still expect to have a consistent checksum for the archives at /archive/refs/tags/$tag.

In summary: It is safe to reference archives of any kind via the /refs/tags endpoint, everything else enjoys no guarantees.

aherrmann commented 2 years ago

@fmeum Thank you for getting in touch with GitHub support and sharing the outcome here.

In summary: It is safe to reference archives of any kind via the /refs/tags endpoint, everything else enjoys no guarantees.

That's great to hear! I think in terms of this issue this means that it can be closed again. The artifacts under /archive/ref/tags/$tag are generated by GitHub and don't have to be pre-built.

That cannot be said, however, for repository code download archives found at archive/v6.0.4.

It's good to know that this distinction exists, I assume the same holds for /archive/$commit.

alexeagle commented 2 years ago

@aherrmann @fmeum we have a new problem with GitHub release archives - they don't give any metrics.

https://github.com/bazelbuild/bazel_metrics was just posted, but e.g. when rules_python changed to follow this guidance in 0.6 the usage numbers go to zero, as you can see from this third-party analyzer https://hanadigital.github.io/grev/?user=bazelbuild&repo=rules_python

any ideas?

fmeum commented 2 years ago

I didn't know about this feature, but I can understand why auto-generated release archives (which are naturally source archives) are exempt from this.

It seems that rulesets will have to choose between a very simple release setup and one with statistics. Maybe the recommendation could be to upload the auto-generated archive as a release artifact? That way, there would only ever be one hash regardless of how users choose to reference the artifact.

aherrmann commented 2 years ago

It seems that rulesets will have to choose between a very simple release setup and one with statistics.

Yes, that seems to be the choice right now.

I don't know what Github's reasons or constraints are around this feature. But, it would certainly be very useful to rule authors to provide metrics for the auto-generated archive, at least for tagged releases. Arguably the lack thereof and the fact that auto-generated release archives exist for every release tag automatically, make metrics obtained from equivalent dedicated release artifacts generally unreliable: How could we as rule authors know whether the majority of our users choose the auto-generated archive or the dedicated one?

alexeagle commented 2 years ago

Github does have traffic numbers, e.g. for rules_python Screenshot from 2022-06-15 08-10-24 We should figure out whether downloading the source archive counts as a "clone" (I'd hope so)

I think those would be just as useful as artifact downloads. Either way the absolute numbers can't be trusted, but it's comparable across rulesets.

brentleyjones commented 1 year ago

Guess GitHub's guarantee didn't mean much? https://twitter.com/thesayynn/status/1620129657977987073

I think the only safe way from here on out is to attach your own release archives to a release.

bk2204 commented 1 year ago

Hey,

I'm one of the engineers in the Git Systems org at GitHub. I think there's been a misinterpretation of what we guarantee as far as stability.

If you generate a release for a particular tag, and you upload your own assets, such as a tarball or binaries, we'll guarantee those don't change. However, the automated "Source code (tar.gz)" and "Source code (zip)" links, as well as any automated archives we generate, aren't guaranteed to be stable. That's because Git doesn't guarantee stability here and we rely on Git to generate those archives on the fly, so as we upgrade, things may change.

If you need a stable source code archive, please generate a release and upload your own archive as part of this process, and then you can reference those with stable hashes.

To give you an example as to what's stable and what's not, if you look at the latest Git LFS release at https://github.com/git-lfs/git-lfs/releases/tag/v3.3.0, all of the Assets entries except the two "Source code" links at the bottom are guaranteed to be stable (since those two are autogenerated). You'll notice we ship our own stable tarball and signed hashes as part of the assets, and that works.

I apologize for the confusion here, and hopefully this clarifies things.

brentleyjones commented 1 year ago

@bk2204 Okay, but the whole build system world is broken right now. Bazel, Homebrew, anything that does checksum hashing. This needs to be reverted and proper comms and a deprecation period needs to be communicated so all of these systems can fix their "broken assumptions".

jfirebaugh commented 1 year ago

@bk2204 This seems to be a change in policy from what engineers/support staff at GitHub have previously communicated:

https://github.com/bazel-contrib/SIG-rules-authors/issues/11#issuecomment-1029861300

Are you saying this policy has changed, and we can no longer rely on checksum stability for /archive/refs/tags/$tag URLs?

vivek-ng commented 1 year ago

@bk2204 I'm sorry but this is unacceptable. Please realize that whatever upgrade you did internally is a backward incompatible change to your end users. Please quote one official document where Github clearly communicated about their checksum guarantee. There are a whole system of build systems which are broken because of this change.

BillyONeal commented 1 year ago

vcpkg and conan are also probably broken

bk2204 commented 1 year ago

Are you saying this policy has changed, and we can no longer rely on checksum stability for /archive/refs/tags/$tag URLs?

I'm saying that policy has never been correct and we've never guaranteed stable checksums for archives, just like Git has never guaranteed that. I apologize that things are broken here and that there hasn't been clearer communication in the past on this, but our policy hasn't changed in over 4 years.

kentonv commented 1 year ago

@bk2204 Your position is completely clear and, in isolation, totally reasonable. But, practically speaking, an enormous number of builds are broken right now, and an enormous number of historical commits will never be capable of building again, unless the hashes go back to the way they were. Is there any possibility that GitHub could admit defeat by Hyrum's Law here?

bk2204 commented 1 year ago

I don't believe this is going to be reverted at the moment, and a Changelog post has just shown up. Again, my apologies for this communication not showing up sooner.

I will mention that the thing that has changed is the compression, since Git has switched from using gzip to an internal call to zlib. Thus, in the interim, if you can depend on the checksum of the uncompressed tarball, that will not have changed here. Of course, that's not a good idea long term (since, again, they're not guaranteed), but it may help you fix the immediate problem temporarily until you can change to using release assets.

ericriff commented 1 year ago

Indeed Conan is broken because of this. +1 on reverting the change if possible

mathstuf commented 1 year ago

FWIW, Spack is also likely to have a problem here.

SuperElectron commented 1 year ago

facing the same issue, our production docker builds are failing

cmazakas commented 1 year ago

I don't believe this is going to be reverted at the moment, and a Changelog post has just shown up. Again, my apologies for this communication not showing up sooner.

Doubling down after breaking like every build in the world isn't a good look. You should probably get your manager to come do some PR work here.

luispadron commented 1 year ago

Again, my apologies for this communication not showing up sooner.

This is the main issue, and why it should be reverted IMO. Literally no one informed anyone of this change (regardless of if it was not guaranteed or not). The communication has been pretty confusing on this to begin with seeing as the same issue happened last year

Everyone who depended on this is completely broken with no simple way to move forward and unblock builds.

timsutton commented 1 year ago

I don't believe this is going to be reverted at the moment, and a Changelog post has just shown up. Again, my apologies for this communication not showing up sooner.

Is homebrew going to need to update their brew formulas which can no longer build from source, by updating thousands of formula source files? (There seem to be several thousand of them which rely on hashes from GitHub-hosted tar.gz archives)

willeccles commented 1 year ago

Homebrew is not the only one. Other package managers' package templates frequently rely on GitHub archives. I agree that GitHub's position here makes sense, but on the other hand, releasing the blog post after ruining everyone's builds is not acceptable IMO. Prior warning would have been ideal.

mathstuf commented 1 year ago

releasing the blog post after ruining everyone's builds is not acceptable IMO. Prior warning would have been ideal.

It has always been this way. They broke back in the early 2010's too (and more than once too) for a similar reason: git changed how git archive works by default.

wyattanderson commented 1 year ago

The collective amount of human effort it will take to break glass, recover broken build systems that are impacted by this change, and republish artifacts across entire software ecosystems could probably cure cancer. Please consider reverting this change as soon as possible. It's fine to announce and plan a migration with a hard deadline, but the disruption this change has caused is massive.

mathstuf commented 1 year ago

Looking at the git change, it could probably be a git -c … archive change. However, all of this reliance on /archive/ means that the default can probably never change…

jfirebaugh commented 1 year ago

I think the response here strongly indicate that it needs to be a product requirement that all URLs linked on release pages have stable contents. For supply chain security, there must be a way to guarantee that release artifacts have not been tampered with. That includes both manually-uploaded and autogenerated artifacts. If the underlying implementation (i.e. git) doesn't guarantee stability, you need to put a caching layer in front that does.

willeccles commented 1 year ago

I think it's unreasonable to say that you can expect release assets to remain consistent and then list archives as if they were releases assets while handling them entirely differently. This is misleading.

kentonv commented 1 year ago

@bk2204 Millions of dollars of damage is being done here. It's understandable that you didn't expect people to depend on hashes being stable, but now that you know and understand how widely this assumption is being relied upon, the prudent thing to do would be to roll back and re-evaluate. You might just have to keep the old code around in a container than runs whenever older commits are requested -- you can use the new code on commits with newer timestamps.

schmidt-sebastian commented 1 year ago

Every build relying on Bazel is broken as well, and we cannot even fix the Build on our own since our dependencies have to be fixed first.