buildbarn / bb-remote-asset

An implementation of the Remote Asset API
Apache License 2.0
7 stars 13 forks source link

Since Moving to Shared Storage Not Assets Found #27

Closed pseymournutanix closed 2 years ago

pseymournutanix commented 2 years ago

Hello,

When running with a single pod statefulset storage backend I can no longer find any assets so the cache pulls aren't working:-

2022/02/27 15:01:05 Fetching Directory [urn:fdc:buildstream.build:2020:v1:bst-common/base/2aaa3338e361f55c77eb31c634545891985a4590c23cee4f4cde5e8b4de79d2b] with qualifiers []
2022/02/27 15:01:05 FetchBlob completed for [urn:fdc:buildstream.build:2020:v1:bst-common/base/2aaa3338e361f55c77eb31c634545891985a4590c23cee4f4cde5e8b4de79d2b] with status code 5

The asset configMap looks like this:-

  asset.jsonnet: |
    local common = import 'common.libsonnet';
    {
      fetcher: {
        caching: {
          fetcher: {
            https: {
              allowUpdatesForInstances: [''],
              contentAddressableStorage: common.blobstore.contentAddressableStorage,
            }
            'error': {
              code: 5,
              message: "Asset Not Found",
            }
          }
        }
      },

      assetCache: {
        actionCache: {
          blobstore: common.blobstore,
        },
        blobAccess: {
          assetStore: {
            'local': {
              keyLocationMapOnBlockDevice: {
                file: {
                  path: '/storage/key_location_map',
                  sizeBytes: 1024 * 1024,
                },
              },
              keyLocationMapMaximumGetAttempts: 8,
              keyLocationMapMaximumPutAttempts: 32,
              oldBlocks: 8,
              currentBlocks: 24,
              newBlocks: 3,
              blocksOnBlockDevice: {
                source: {
                  file: {
                    path: '/storage/blocks',
                    sizeBytes: 200 * 1024 * 1024,
                  },
                },
                spareBlocks: 3,
              },

              # Add this chunk if you also want it to be persistent across restarts. If no persistency is needed, just omit this.
              persistent: {
                stateDirectoryPath: '/storage/persistent_state',
                minimumEpochInterval: '5m',
              },
            },
          },
          contentAddressableStorage:
            common.blobstore.contentAddressableStorage,
        },
      },
      grpcServers: [{
        listenAddresses: [':7981'],
        authenticationPolicy: { allow: {} },
      }],
      allowUpdatesForInstances: [''],
      maximumMessageSizeBytes: 16 * 1024 * 1024,
    }

With the common configMap looks like:-

  common.libsonnet: |
    {
      blobstore: {
        contentAddressableStorage: {
          sharding: {
            hashInitialization: 11946695773637837490,
            shards: [
              {
                backend: {
                  grpc: { address: 'storage-0.storage.buildbarn:7982' },
                },
                weight: 1,
              },
              {
                backend: {
                  grpc: { address: 'storage-1.storage.buildbarn:7982' },
                },
                weight: 1,
              },
              {
                backend: {
                  grpc: { address: 'storage-2.storage.buildbarn:7982' },
                },
                weight: 1,
              },
            ],
          },
        },
        actionCache: {
          completenessChecking: {
            sharding: {
              hashInitialization: 14897363947481274433,
              shards: [
                {
                  backend: {
                    grpc: { address: 'storage-0.storage.buildbarn:7982' },
                  },
                  weight: 1,
                },
                {
                  backend: {
                    grpc: { address: 'storage-1.storage.buildbarn:7982' },
                  },
                  weight: 1,
                },
                {
                  backend: {
                    grpc: { address: 'storage-2.storage.buildbarn:7982' },
                  },
                  weight: 1,
                },
              ],
            },
          },
        },
      },

Have I missed something or done something wrong here. Do I need to use a frontend and reference that somewhere, or is this an issue ? Thanks.

EdSchouten commented 2 years ago

I suspect that status code 5 in that error message above stands for NOT_FOUND:

https://grpc.github.io/grpc/core/md_doc_statuscodes.html

Still impractical that that code doesn't display the full gRPC status object, including the message.

Qinusty commented 2 years ago

Hi,

There's some misconfiguration going on here which is likely the cause.

assetCache requires one of actionCache or blobAccess as per the proto[1]

Also your fetched is configured with multiple OneOfs also, try removing the 'error' block and just leave the Https field/value. Take a look at the README example[2] for how the Https fetcher can be configured properly with caching etc

1: https://github.com/buildbarn/bb-remote-asset/blob/6a6e6f90b379c69884ab2f35170878420fba22c4/pkg/proto/configuration/bb_remote_asset/bb_remote_asset.proto#L41

2: https://github.com/buildbarn/bb-remote-asset#setting-up-the-remote-asset-daemon

On Sun, 27 Feb 2022, 15:06 Paul Seymour, @.***> wrote:

Hello,

When running with a single pod statefulset storage backend I can no longer find any assets so the cache pulls aren't working:-

2022/02/27 15:01:05 Fetching Directory [urn:fdc:buildstream.build:2020:v1:bst-common/base/2aaa3338e361f55c77eb31c634545891985a4590c23cee4f4cde5e8b4de79d2b] with qualifiers [] 2022/02/27 15:01:05 FetchBlob completed for [urn:fdc:buildstream.build:2020:v1:bst-common/base/2aaa3338e361f55c77eb31c634545891985a4590c23cee4f4cde5e8b4de79d2b] with status code 5

The asset configMap looks like this:-

asset.jsonnet: | local common = import 'common.libsonnet'; { fetcher: { caching: { fetcher: { https: { allowUpdatesForInstances: [''], contentAddressableStorage: common.blobstore.contentAddressableStorage, } 'error': { code: 5, message: "Asset Not Found", } } } },

  assetCache: {
    actionCache: {
      blobstore: common.blobstore,
    },
    blobAccess: {
      assetStore: {
        'local': {
          keyLocationMapOnBlockDevice: {
            file: {
              path: '/storage/key_location_map',
              sizeBytes: 1024 * 1024,
            },
          },
          keyLocationMapMaximumGetAttempts: 8,
          keyLocationMapMaximumPutAttempts: 32,
          oldBlocks: 8,
          currentBlocks: 24,
          newBlocks: 3,
          blocksOnBlockDevice: {
            source: {
              file: {
                path: '/storage/blocks',
                sizeBytes: 200 * 1024 * 1024,
              },
            },
            spareBlocks: 3,
          },

          # Add this chunk if you also want it to be persistent across restarts. If no persistency is needed, just omit this.
          persistent: {
            stateDirectoryPath: '/storage/persistent_state',
            minimumEpochInterval: '5m',
          },
        },
      },
      contentAddressableStorage:
        common.blobstore.contentAddressableStorage,
    },
  },
  grpcServers: [{
    listenAddresses: [':7981'],
    authenticationPolicy: { allow: {} },
  }],
  allowUpdatesForInstances: [''],
  maximumMessageSizeBytes: 16 * 1024 * 1024,
}

With the common configMap looks like:-

common.libsonnet: | { blobstore: { contentAddressableStorage: { sharding: { hashInitialization: 11946695773637837490, shards: [ { backend: { grpc: { address: 'storage-0.storage.buildbarn:7982' }, }, weight: 1, }, { backend: { grpc: { address: 'storage-1.storage.buildbarn:7982' }, }, weight: 1, }, { backend: { grpc: { address: 'storage-2.storage.buildbarn:7982' }, }, weight: 1, }, ], }, }, actionCache: { completenessChecking: { sharding: { hashInitialization: 14897363947481274433, shards: [ { backend: { grpc: { address: 'storage-0.storage.buildbarn:7982' }, }, weight: 1, }, { backend: { grpc: { address: 'storage-1.storage.buildbarn:7982' }, }, weight: 1, }, { backend: { grpc: { address: 'storage-2.storage.buildbarn:7982' }, }, weight: 1, }, ], }, }, }, },

Have I missed something or done something wrong here. Do I need to use a frontend and reference that somewhere, or is this an issue ? Thanks.

— Reply to this email directly, view it on GitHub https://github.com/buildbarn/bb-remote-asset/issues/27, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABXQCXOU55YSB44CMWEYHSDU5I4Y5ANCNFSM5PPBV3JA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

You are receiving this because you are subscribed to this thread.byMessage ID: @.***>

pseymournutanix commented 2 years ago

Thanks for the help.

Changing the remote-asset configuration to:-

data:
  asset.jsonnet: |
    local common = import 'common.libsonnet';
    {
      fetcher: {
        caching: {
          fetcher: {
            http: {
              allowUpdatesForInstances: [''],
              contentAddressableStorage: {
                grpc: {
                  address: "frontend.buildbarn:8888"
                },
              }
            }
          }
        }
      },

      assetCache: {
        blobAccess: {
          assetStore: {
...
          contentAddressableStorage:
            common.blobstore.contentAddressableStorage,
        },
      },

Firstly is the entry in the fetcher correct as it points to a frontend storage pod, or should it detail all the storage shards ?

With this config in I now get the following from my buildstream test build:-

    [00:00:00][2aaa3338][ pull:base.bst                      ] FAILURE Failed to pull artifact 2aaa3338: Failed to pull ref bst-common/base/2aaa3338e361f55c77eb31c634545891985a4590c23cee4f4cde5e8b4de79d2b: <_InactiveRpcError of RPC that terminated with:
        status = StatusCode.PERMISSION_DENIED
        details = "HTTP Fetching of directories is not supported!"
        debug_error_string = "{"created":"@1646040952.964894316","description":"Error received from peer ipv4:10.39.174.131:11002","file":"src/core/lib/surface/call.cc","file_line":1074,"grpc_message":"HTTP Fetching of directories is not supported!","grpc_status":7}"

Now getting a RC 7 from the remote-asset pod :)

2022/02/28 09:35:52 Fetching Directory [urn:fdc:buildstream.build:2020:v1:bst-common/base/2aaa3338e361f55c77eb31c634545891985a4590c23cee4f4cde5e8b4de79d2b] with qualifiers []
2022/02/28 09:35:52 FetchBlob completed for [urn:fdc:buildstream.build:2020:v1:bst-common/base/2aaa3338e361f55c77eb31c634545891985a4590c23cee4f4cde5e8b4de79d2b] with status code 7

Any help for me would be appreciated.

Qinusty commented 2 years ago

Ah ofcourse this is for buildstream and not bazel!

iirc the configurations with buildstream generally avoid http fetcher and support the push functionality of buildstream. If you replace the http with the 'error'' block you had before does it work?

As for the sharding configuration, I would avoid passing unnecessary data through the frontend if you have access to the sharding configuration (like in this example). Clients should go through the frontend for cache access.

pseymournutanix commented 2 years ago

Hi,

Yes if I move the fetcher block back to the original and remove the actionCache block from inside the blobAccess and leave contentAddressableStorage I get pushes / pulls OK it seems. It's not the quickest so wondering if it's optimal. 49s for 600mb approx artifacts on a local network - not too bad I suppose :)

Pipeline
     waiting 5a9cd54bcaa0a9b01e6244ced5361d0d9259e932a8b16fcae5c86296bb9d33e0 base.bst
===============================================================================
[--:--:--][5a9cd54b][ pull:base.bst                      ] START   bst-common/base/5a9cd54b-pull.2368.log
[--:--:--][5a9cd54b][ pull:base.bst                      ] STATUS  Pulling artifact 5a9cd54b <- http://buildbarn-cache.my.domain:11001
[--:--:--][5a9cd54b][ pull:base.bst                      ] INFO    Pulled artifact 5a9cd54b <- http://buildbarn-cache.my.domain:11001
[00:00:47][5a9cd54b][ pull:base.bst                      ] SUCCESS bst-common/base/5a9cd54b-pull.2368.log
[--:--:--][][] START   cache_size/cache_size.2439.log
[--:--:--][][] STATUS  Cache usage recomputed: 559.5M / infinity (0%)
[00:00:00][][] SUCCESS cache_size/cache_size.2439.log
[--:--:--][5a9cd54b][ push:base.bst                      ] START   bst-common/base/5a9cd54b-push.2441.log
[--:--:--][5a9cd54b][ push:base.bst                      ] STATUS  Pushing artifact 5a9cd54b -> http://buildbarn-cache.my.domain:11002
[--:--:--][5a9cd54b][ push:base.bst                      ] INFO    Remote (http://buildbarn-cache.my.domain:11002) already has 5a9cd54b cached
[00:00:00][5a9cd54b][ push:base.bst                      ] SKIPPED Push

But this is certainly progress. Many thanks for the help, and pointers.

pseymournutanix commented 2 years ago

Thanks for all the help. This was a configuration error on my part.