buildbarn / bb-deployments

Example deployments of Buildbarn on various platforms
Apache License 2.0
102 stars 70 forks source link

update remote-execution? #78

Closed asartori86 closed 1 year ago

asartori86 commented 1 year ago

Hi, If I pick the bb-remote-execution version reported in the master branch of this repo (e664853), I get many of these

3: Failed to obtain input directory ".": Buffer is 158 bytes in size, while a maximum of 0 bytes is permitted

In fact, at that point in time, the cas.blobAccessDirectoryFetcher.maximumMessageSizeBytes member is left uninitialized by the NewBlobAccessDirectoryFetcher function.

https://github.com/buildbarn/bb-remote-execution/blob/e664853dff060942398e4a8f4906a2a8738d307b/pkg/cas/blob_access_directory_fetcher.go#L31

I know that it will be fixed in a following commit (https://github.com/buildbarn/bb-remote-execution/commit/85954ef84d8c1bb8d26a73574dd45696e44c8b2a) but I am not sure which commit I should pick such that the combination is stable.

moroten commented 1 year ago

I'll do an upgrade at latest next week. Thank you for reporting.

asartori86 commented 1 year ago

@moroten , thanks

just for the reference, this fixes the bug

diff --git a/pkg/cas/blob_access_directory_fetcher.go b/pkg/cas/blob_access_directory_fetcher.go
index af5a662..292735b 100644
--- a/pkg/cas/blob_access_directory_fetcher.go
+++ b/pkg/cas/blob_access_directory_fetcher.go
@@ -30,6 +30,7 @@ func NewBlobAccessDirectoryFetcher(blobAccess blobstore.BlobAccess, maximumMessa
                slicer: treeBlobSlicer{
                        maximumMessageSizeBytes: maximumMessageSizeBytes,
                },
+               maximumMessageSizeBytes: maximumMessageSizeBytes,
        }
 }
moroten commented 1 year ago

I now have a PR for bb-storage to upgrade to Bazel 6. I'll do one repo at a time to get them all in sync, so bb-deployments will get there soon.

asartori86 commented 1 year ago

@moroten , happy new year :)

I've just rebased my fork on top of the bb-remote-execution commit reported, however, I still get the same error.

In fact, in bb_worker/cmd.go:102 the MaximumTreeSizeBytes is set to zero

https://github.com/buildbarn/bb-remote-execution/blob/9de9f273bb78ae29e4a174fcbc3ce5bc6ec502d7/cmd/bb_worker/main.go#L102

What am I missing?

moroten commented 1 year ago

@asartori86 Sorry for the late reply and happy new year to you too! I'll reopen this issue as you still have problems.

So what the code says is that the tree size is limited to zero, but trees are never cached in the worker: "This process does not read Tree objects." That said, there must be somewhere MaximumMessageSizeBytes is unset for you. Do you have a worker configuration that you can share?

asartori86 commented 1 year ago

thanks @moroten here is the relevant part of my worker

local common = import 'common.libsonnet';

{
  blobstore: common.blobstore,
  browserUrl: common.browserUrl,
  maximumMessageSizeBytes: common.maximumMessageSizeBytes,
  scheduler: { address: 'scheduler:8983' },
  global: common.global,
  buildDirectories: [{
    native: {
      buildDirectoryPath: '/worker/build',
      cacheDirectoryPath: '/worker/cache',
      maximumCacheFileCount: 10000,
      maximumCacheSizeBytes: 1024 * 1024 * 1024,
      cacheReplacementPolicy: 'LEAST_RECENTLY_USED',
    },
    runners: [{
      endpoint: { ... },
      concurrency: 8,
      platform: {
        properties: [ ... ],
      },
      workerId: { ... },
    }],
  }],
  outputUploadConcurrency: 11,
  directoryCache: {
    maximumCount: 1000,
    maximumSizeBytes: 1000 * 1024,
    cacheReplacementPolicy: 'LEAST_RECENTLY_USED',
  },
}

and the common looks like this

{
  blobstore: {
    contentAddressableStorage: {
      shardedMultiGeneration: {
        Shards: [
          {
            backend: { grpc: { address: 'storage-0:8981' } },
          },
          {
            backend: { grpc: { address: 'storage-1:8981' } },
          },
        ],
              maxTreeTraversalConcurrency: 4,
              queryIntervalSeconds: 30,
      },
    },
    actionCache: {
    completenessChecking: {
      backend:{
        sharding: {
        hashInitialization: 14897363947481274433,
        shards: [
            {
             backend: { grpc: { address: 'storage-0:8981' } },
             weight: 1,
             },
             {
             backend: { grpc: { address: 'storage-1:8981' } },
             weight: 1,
             },
           ],
          },
          },
    maximumTotalTreeSizeBytes: 64 * 1024 * 1024,
        },
      },
  },
  browserUrl: 'http://localhost:7984',
  httpListenAddress: ':80',
  maximumMessageSizeBytes: 16 * 1024 * 1024,
  global: {
    diagnosticsHttpServer: {
      listenAddress: ':9980',
      enablePrometheus: true,
      enablePprof: true,
      enableActiveSpans: true,
    },
  },
}
EdSchouten commented 1 year ago

In fact, in bb_worker/cmd.go:102 the MaximumTreeSizeBytes is set to zero

https://github.com/buildbarn/bb-remote-execution/blob/9de9f273bb78ae29e4a174fcbc3ce5bc6ec502d7/cmd/bb_worker/main.go#L102

That is correct. bb_worker never reads Tree objects, so it should be completely safe to leave that at zero.

The error message that you shared originally:

3: Failed to obtain input directory ".": Buffer is 158 bytes in size, while a maximum of 0 bytes is permitted

indicates that maximumMessageSizeBytes is set to zero somewhere. Can you make sure that you're running the very latest version of all the container images everywhere?

asartori86 commented 1 year ago

thanks @EdSchouten , when I rebased, I didn't correctly update the function blobAccessDirectoryFetcher.GetDirectory. I had

ToProto(&remoteexecution.Directory{}, int(df.maximumTreeSizeBytes))

instead of

ToProto(&remoteexecution.Directory{}, int(df.slicer.maximumDirectorySizeBytes))