aspect-build / bazel-lib

Common useful functions for writing BUILD files and Starlark macros/rules
https://docs.aspect.build/rules/aspect_bazel_lib
Apache License 2.0
130 stars 78 forks source link

[FR]: use sha256 instead of sha386 for integrity? #837

Closed hunshcn closed 1 month ago

hunshcn commented 2 months ago

What is the current behavior?

now jq/yq binary use sha386 integrity

But mainstream bazel remote cache server can only cache the assets with sha256/blake3 integrity

Describe the feature

use sha256 integrity

hunshcn commented 1 month ago

@alexeagle @gregmagolan

Will this be accepted? If I take a pr.

alexeagle commented 1 month ago

I don't think we care which hash algorithm is used, we just prefer to mirror whatever data is already available rather than download all the binaries and compute checksums ourselves.

alexeagle commented 1 month ago

Is there an issue for whatever system is failing to understand other hash algorithms? If we change this, it's a workaround for someone else's bug, so I'd at least like to know when they've fixed it.

hunshcn commented 1 month ago

Is there an issue for whatever system is failing to understand other hash algorithms? If we change this, it's a workaround for someone else's bug, so I'd at least like to know when they've fixed it.

see https://github.com/buildbuddy-io/buildbuddy/pull/6911 https://github.com/buchgr/bazel-remote/blob/fea980bd8f528ccfc38919fb709e9e47d27a1622/server/grpc_asset.go#L79

alexeagle commented 1 month ago

Oh I see, this is just for the remote asset API. I don't think it helps much to change integrity format on a couple binaries here - the problem is much more widespread (i.e. https://github.com/aspect-build/toolchains_protoc/blob/main/protoc/private/versions.bzl#L54 )

sluongng commented 1 month ago

yeah this is for Remote Asset API.

Traditionally the API has always been SHA256 based. Meaning that if you use it with some exotic hasher like SHA512 (rules_js) or SHA386 (jq, protoc etc…), then you get no cache hit from the Remote Cache and your download would always reach upstream URL. That’s functional, but not ideal for folks who rely on Remote Asset API for some speed up.

We recently implemented the changes in the Remote Asset API + Bazel that enables to send over a separate “storage” hasher in addition to the previous “content verify” hasher. This allows us to receive a download request with SHA512 verification, but BLAKE3 storage and download. There would still be no cache hit on this flow, but at least now, you can change the hasher Bazel would use to download/store the blob.

The remaining question is: how do we facilitate for cache hits for cases where the integrity in rctx.download is neither SHA256 or BLAKE3 (the 2 popular hash method Bazel supports)?