Closed asartori86 closed 1 year ago
I'll do an upgrade at latest next week. Thank you for reporting.
@moroten , thanks
just for the reference, this fixes the bug
diff --git a/pkg/cas/blob_access_directory_fetcher.go b/pkg/cas/blob_access_directory_fetcher.go
index af5a662..292735b 100644
--- a/pkg/cas/blob_access_directory_fetcher.go
+++ b/pkg/cas/blob_access_directory_fetcher.go
@@ -30,6 +30,7 @@ func NewBlobAccessDirectoryFetcher(blobAccess blobstore.BlobAccess, maximumMessa
slicer: treeBlobSlicer{
maximumMessageSizeBytes: maximumMessageSizeBytes,
},
+ maximumMessageSizeBytes: maximumMessageSizeBytes,
}
}
I now have a PR for bb-storage to upgrade to Bazel 6. I'll do one repo at a time to get them all in sync, so bb-deployments will get there soon.
@moroten , happy new year :)
I've just rebased my fork on top of the bb-remote-execution commit reported, however, I still get the same error.
In fact, in bb_worker/cmd.go:102
the MaximumTreeSizeBytes
is set to zero
What am I missing?
@asartori86 Sorry for the late reply and happy new year to you too! I'll reopen this issue as you still have problems.
So what the code says is that the tree size is limited to zero, but trees are never cached in the worker: "This process does not read Tree objects." That said, there must be somewhere MaximumMessageSizeBytes
is unset for you. Do you have a worker configuration that you can share?
thanks @moroten here is the relevant part of my worker
local common = import 'common.libsonnet';
{
blobstore: common.blobstore,
browserUrl: common.browserUrl,
maximumMessageSizeBytes: common.maximumMessageSizeBytes,
scheduler: { address: 'scheduler:8983' },
global: common.global,
buildDirectories: [{
native: {
buildDirectoryPath: '/worker/build',
cacheDirectoryPath: '/worker/cache',
maximumCacheFileCount: 10000,
maximumCacheSizeBytes: 1024 * 1024 * 1024,
cacheReplacementPolicy: 'LEAST_RECENTLY_USED',
},
runners: [{
endpoint: { ... },
concurrency: 8,
platform: {
properties: [ ... ],
},
workerId: { ... },
}],
}],
outputUploadConcurrency: 11,
directoryCache: {
maximumCount: 1000,
maximumSizeBytes: 1000 * 1024,
cacheReplacementPolicy: 'LEAST_RECENTLY_USED',
},
}
and the common looks like this
{
blobstore: {
contentAddressableStorage: {
shardedMultiGeneration: {
Shards: [
{
backend: { grpc: { address: 'storage-0:8981' } },
},
{
backend: { grpc: { address: 'storage-1:8981' } },
},
],
maxTreeTraversalConcurrency: 4,
queryIntervalSeconds: 30,
},
},
actionCache: {
completenessChecking: {
backend:{
sharding: {
hashInitialization: 14897363947481274433,
shards: [
{
backend: { grpc: { address: 'storage-0:8981' } },
weight: 1,
},
{
backend: { grpc: { address: 'storage-1:8981' } },
weight: 1,
},
],
},
},
maximumTotalTreeSizeBytes: 64 * 1024 * 1024,
},
},
},
browserUrl: 'http://localhost:7984',
httpListenAddress: ':80',
maximumMessageSizeBytes: 16 * 1024 * 1024,
global: {
diagnosticsHttpServer: {
listenAddress: ':9980',
enablePrometheus: true,
enablePprof: true,
enableActiveSpans: true,
},
},
}
In fact, in
bb_worker/cmd.go:102
theMaximumTreeSizeBytes
is set to zero
That is correct. bb_worker never reads Tree objects, so it should be completely safe to leave that at zero.
The error message that you shared originally:
3: Failed to obtain input directory ".": Buffer is 158 bytes in size, while a maximum of 0 bytes is permitted
indicates that maximumMessageSizeBytes is set to zero somewhere. Can you make sure that you're running the very latest version of all the container images everywhere?
thanks @EdSchouten , when I rebased, I didn't correctly update the function blobAccessDirectoryFetcher.GetDirectory
. I had
ToProto(&remoteexecution.Directory{}, int(df.maximumTreeSizeBytes))
instead of
ToProto(&remoteexecution.Directory{}, int(df.slicer.maximumDirectorySizeBytes))
Hi, If I pick the bb-remote-execution version reported in the master branch of this repo (e664853), I get many of these
In fact, at that point in time, the
cas.blobAccessDirectoryFetcher.maximumMessageSizeBytes
member is left uninitialized by theNewBlobAccessDirectoryFetcher
function.https://github.com/buildbarn/bb-remote-execution/blob/e664853dff060942398e4a8f4906a2a8738d307b/pkg/cas/blob_access_directory_fetcher.go#L31
I know that it will be fixed in a following commit (https://github.com/buildbarn/bb-remote-execution/commit/85954ef84d8c1bb8d26a73574dd45696e44c8b2a) but I am not sure which commit I should pick such that the combination is stable.