Closed thoughtpolice closed 1 year ago
The rounded 16
limit was curious so I poked around, and I was able to work around this error by increasing the message size for GRPC messages in buildbarn, it looks like:
diff --git a/buck/nix/bb/docker/config/common.libsonnet b/buck/nix/bb/docker/config/common.libsonnet
--- a/buck/nix/bb/docker/config/common.libsonnet
+++ b/buck/nix/bb/docker/config/common.libsonnet
@@ -38,7 +38,7 @@
},
browserUrl: 'http://localhost:7984',
httpListenAddress: ':80',
- maximumMessageSizeBytes: 16 * 1024 * 1024,
+ maximumMessageSizeBytes: 64 * 1024 * 1024,
global: {
diagnosticsHttpServer: {
listenAddress: ':9980',
Surely there's a better way in the reapi to download large action cache blobs incrementally without bloating the individual gRPC message limit, though...
Hmm, I will double check but I don’t recall the CAS GRPC API exposing range reads so it’s possible that raising this limit is indeed the right approach.
On Sat, 15 Apr 2023 at 19:09, Austin Seipp @.***> wrote:
The rounded 16 limit was curious so I poked around, and I was able to work around this error by increasing the message size for GRPC messages in buildbarn, it looks like:
diff --git a/buck/nix/bb/docker/config/common.libsonnet b/buck/nix/bb/docker/config/common.libsonnet --- a/buck/nix/bb/docker/config/common.libsonnet +++ b/buck/nix/bb/docker/config/common.libsonnet @@ -38,7 +38,7 @@ }, browserUrl: 'http://localhost:7984', httpListenAddress: ':80',
- maximumMessageSizeBytes: 16 1024 1024,
- maximumMessageSizeBytes: 64 1024 1024, global: { diagnosticsHttpServer: { listenAddress: ':9980',
Surely there's a better way in the reapi to download large action cache blobs incrementally without bloating the individual gRPC message limit, though...
— Reply to this email directly, view it on GitHub https://github.com/facebook/buck2/issues/170#issuecomment-1509895083, or unsubscribe https://github.com/notifications/unsubscribe-auth/AANIHVVFNBHTKKKELSKPWCTXBLI4NANCNFSM6AAAAAAW7QDLWU . You are receiving this because you are subscribed to this thread.Message ID: @.***>
I tripped up on something else now too, message size decoding limit. Looks like it's trying to download the entirety of the output artifact, which is about 160MB, while the gRPC default limit in tonic is 4MB(!) A bump to tonic for 0.9.0+ is needed to fix that, I think:
Action failed: prelude//toolchains/rust:rust-stable (nix_build rust-stable-rust-stable.nix)
Internal error (stage: remote_upload_error): Remote Execution Error (GRPC-SESSION-ID):
RE: upload: status: ResourceExhausted, message: "grpc: received message larger than max (165245018 vs. 4194304)", details: [], metadata: MetadataMap { headers: {"content-type": "application/grpc"} }
Testing a patch for that now, possibly...
Ah, it's actually a BuildBarn error, I think. The problem isn't downloading, (which would fix the decoding size), it's uploading: https://grep.app/search?q=grpc%3A%20received%20message%20larger%20than%20max
So I think there the problem is actually writing smaller blobs to the store, not reading things out of it...
Ah, right. So the spec says https://github.com/thoughtpolice/reapi-server/blob/eae2e3a51bb8053ae7fe290fb1e9fedb462b651a/protos/build/bazel/remote/execution/v2/remote_execution.proto#L205-L209
// For small file uploads the client should group them together and call
// [BatchUpdateBlobs][build.bazel.remote.execution.v2.ContentAddressableStorage.BatchUpdateBlobs].
//
// For large uploads, the client must use the
// [Write method][google.bytestream.ByteStream.Write] of the ByteStream API.
So it instead needs to write content to google.bytestream.ByteStream
, which you can verify exists on BuildBarn:
austin@GANON:~/src/reapi-server$ grpcurl -plaintext '127.0.0.1:8980' describe | grep 'is a service'
build.bazel.remote.execution.v2.ActionCache is a service:
build.bazel.remote.execution.v2.Capabilities is a service:
build.bazel.remote.execution.v2.ContentAddressableStorage is a service:
build.bazel.remote.execution.v2.Execution is a service:
google.bytestream.ByteStream is a service:
grpc.health.v1.Health is a service:
grpc.reflection.v1alpha.ServerReflection is a service:
It does look like we should use this API, yeah (for reads & writes), the RE bits are arguably MVP-level right now sicne we do not use them internally.
Is that something you'd wanted to do yourself since you've been debugging it? Otherwise I'll try to see if I can find the time
The code for all this is in remote_execution/oss/re_grpc/src/client.rs
. This will also require adding the ByteStream service protobuf definitions at remote_execution/oss/re_grpc_proto/proto/google/bytestream/bytestream.proto
Yeah, I'll try giving it a shot, since the code looks fairly isolated.
I just "solved" this on the buildbarn side by setting maximumReceivedMessageSizeBytes
under the grpc server section (not not the same as maximumMessageSizeBytes).
It's rather slow, so not a total fix, but it's something.
If you follow along along with the saga in https://github.com/thoughtpolice/buck2-nix/issues/12, I have almost got remote execution working on buildbarn with Nix! Except that the tarball I download seems to be a tad bit too large (formatted for legibility in the github UI):
I can see that the action itself completed successfully in the buildbarn action cache UI, and that the file downloaded (given in the error above, from
rust-overlay
) is 23508808 bytes in size:I'm not sure how to approach this. Maybe it's a gRPC limit? The
CacheCapabilities
don't exactly specify anything about this: