EdSchouten commented 1 year ago

Ever since we published ADR #2 which introduced LocalBlobAccess, ShardingBlobAccess and MirroredBlobAccess, the writing has been on the wall that there would no longer be any place for backends such as Redis and SizeDistinguishing.

The main reason I kept these two backends around, was because people in the community were still interested in seeing S3 support reappear. Because I have not received any serious proposal from members within the community to do this in a way that it meets the standards for inclusion, I think we've now reached the point where we can assume no work is going to happen in this area in the short term.

Furthermore, the HTTP backend was mainly of use to talk to bazel-remote, but as far as I know, that also supports gRPC perfectly fine nowadays. There is absolutely no reason to prefer HTTP over gRPC. The latter supports doing bulk existence checks, querying capabilities, while the former does not.

NOTE: My plan is to merge this change on 2023-10-01, assuming no compelling use cases for them are presented.

ulrfa commented 1 year ago

We are one of those connecting Buildbarn to bazel-remote via HTTP. Such Buildbarn setup works well for us today and I would be happy being able to continue using also HTTP.

Bazel-remote supports also gRPC, but we get higher throughput and lower CPU usage with HTTP compared to gRPC. I believe that is because HTTP allows using Linux's sendfile(2) for transfers directly in kernel space, between file and socket.

More concurrent TCP sessions makes it easier for us to achieve even balance between multiple network interfaces when using link aggregation. That is not a strict requirement to use HTTP, but a typical HTTP setup is using more concurrent TCP sessions than a typical gRPC setup.

Querying capabilities is not mandatory for our use cases. But I agree that we lack FindMissingBlobs with HTTP. However, the magnitude of that problem depends on the types of builds performed and artifacts produced.

I think one of Buildbarn's main strengths is the great architecture that allows many types of blobstore implementations. What is driving removal of the support for HTTP? Is it expensive to maintain? Are you planning changes to the blobstore interface that would be incompatible with the HTTP implementation? Or something else?

EdSchouten commented 1 year ago

Bazel-remote supports also gRPC, but we get higher throughput and lower CPU usage with HTTP compared to gRPC. I believe that is because HTTP allows using Linux's sendfile(2) for transfers directly in kernel space, between file and socket.

Though I can well imagine that using sendfile(2) speeds things up, I have to say that at least in the case of bb_storage CPU usage has never been a limiting factor. It tends to be storage space. For example, on AWS the most storage heavy / CPU deprived instances are the is4gen.* family. The largest instance type has 32 cores and 30 TB of storage. My observation is that if you're aiming for a cache retention of only 48 hours, you will still only see 5% CPU usage on average. And this is with bb_storage doing SHA-256 sum validation for every read/write request.

What is special about bazel-remote that CPU usage is so high, or your setup that it is so I/O intensive?

More concurrent TCP sessions makes it easier for us to achieve even balance between multiple network interfaces when using link aggregation. That is not a strict requirement to use HTTP, but a typical HTTP setup is using more concurrent TCP sessions than a typical gRPC setup.

Even though I'm not denying there are workloads for which it's necessary to open multiple connections to get decent performance, I do doubt whether it's necessary in most distributed setups. Given a sufficiently large number of workers (e.g., n > 10) that each open a gRPC channel to a storage server, won't it already be the case that these connections will get spread out across network interfaces evenly?

I think one of Buildbarn's main strengths is the great architecture that allows many types of blobstore implementations. What is driving removal of the support for HTTP?

Nothing within a pure Buildbarn setup would make use of it, as there is no HTTP server counterpart in our tree.
The only commonly used piece of software that does support it (i.e., bazel-remote) also supports gRPC, making it redundant.

Is it expensive to maintain? Are you planning changes to the blobstore interface that would be incompatible with the HTTP implementation? Or something else?

It's not necessarily expensive to maintain, but over the last couple of years I did need to make changes to it to catch up with API changes to BlobAccess (most recently the overhaul of digest functions and GetFromComposite()). Every time I want to make any changes along those lines, I need to patch up about 15-20 BlobAccess implementations, which is getting rather cumbersome.

minor-fixes commented 1 year ago

Our org builds many artifacts, which tend to blow through the main CAS cache in O(days), but some teams would like select artifacts/builds persisted for O(months). Our team has been toying with setting up a PoC that uses bb-storage in front of a GCS bucket, and tooling around bb_copy/bb_replicator in order to allow users to selectively "archive" an entire build's output tree from the main instance to this GCS-backed instance. We could then use a properly-configured bb_clientd to mount it back, adding caching layers either on the server side in front of the bucket, or on the client side (or both).

We would only need to add some out-of-band metadata-recording/tagging of archived builds to allow for later discovery, and cleanup; maybe we would even (ab)use this storage for distribution of built binaries.

The attractive piece that this would remove is the ability to get a CAS instance backed by essentially unbounded storage. Do you have recommendations for any alternatives? Is this an inherently flawed plan, even if it is not in a build hot path, and aggressively cached?

mostynb commented 1 year ago

@EdSchouten: would you consider delaying the http backend removal for a bit, while we investigate the perf issues with bazel-remote/grpc?

ulrfa commented 1 year ago

Though I can well imagine that using sendfile(2) speeds things up, I have to say that at least in the case of bb_storage CPU usage has never been a limiting factor.

I’m not saying the CPU load is particularly high or a limiting factor with bazel-remote, for any of the protocols. But it consumes less CPU with HTTP than gRPC, and lower CPU usage gives more headroom, e.g. to compress blobs.

What is more important than CPU is the higher throughput with HTTP. If I remember correctly, we got 30-40% higher throughput with HTTP compared to gRPC when parallelism was limited to <= 64 concurrent blob transfers. Then the difference decreased when the parallelism increased and at parallelism 2048 both HTTP and gRPC saturated the network at about the same throughput for both protocols. I guess that might be explained by lower latency for HTTP, maybe thanks to sendfile(2).

All of this was on a fast local network with 2 x 10 Gbit/s network interfaces between each host, and with traffic from load generator. I guess the difference might be less noticeable if masked by a slower network. I’m also not sure how it translates to real builds with Buildbarn.

I assume sendfile(2) only help when transferring data directly between socket and file (i.e. not when compressing on the fly, nor with SSL/TLS.)

What I want to say is that there are pros and cons with both HTTP and gRPC and I think they both have a role to play. I guess in theory they could even be used in combination, calling FindMissingBlobs via gRPC and doing file transfers via HTTP.

Even though I'm not denying there are workloads for which it's necessary to open multiple connections to get decent performance, I do doubt whether it's necessary in most distributed setups. Given a sufficiently large number of workers (e.g., n > 10) that each open a gRPC channel to a storage server, won't it already be the case that these connections will get spread out across network interfaces evenly?

I don’t know. I guess:

There can be spikes in IO on particular workers since some actions do lots of IO and others do not.
Not only the storage server's network interfaces can be saturated, also the network interfaces on individual workers.
The risk increases in deployments with more powerful but fewer workers and fewer storage shards. (E.g. if having 3 workers and a single storage server.)

It's not necessarily expensive to maintain, but over the last couple of years I did need to make changes to it to catch up with API changes to BlobAccess (most recently the overhaul of digest functions and GetFromComposite()). Every time I want to make any changes along those lines, I need to patch up about 15-20 BlobAccess implementations, which is getting rather cumbersome.

It would have been nice if Buildbarn could have a plugin system for separate BlobAccess implementations that do not fulfill the requirements to be officially supported. E.g. stored in a separate repository like bb-event-service so that you don’t need to bother updating them when you do changes to BlobAccess. But I don’t know how to architect such plugin system, or if it would make sense from your perspective?

EdSchouten commented 12 months ago

Responding to all messages at once.

@minor-fixes

Our org builds many artifacts, which tend to blow through the main CAS cache in O(days), but some teams would like select artifacts/builds persisted for O(months). Our team has been toying with setting up a PoC that uses bb-storage in front of a GCS bucket, and tooling around bb_copy/bb_replicator in order to allow users to selectively "archive" an entire build's output tree from the main instance to this GCS-backed instance.

This is perfectly reasonable. I have never been against having an S3/GCS/... backend for the purpose of archiving. It's just that the version we had in the tree was intended to be used as a direct CAS for a remote caching/execution system, which is obviously a bad fit.

@mostynb

@EdSchouten: would you consider delaying the http backend removal for a bit, while we investigate the perf issues with bazel-remote/grpc?

Sure! Would 2023-11-01 work for you? So remove SizeDistinguishing and Redis on 2023-10-01 as planned, and leave HTTP in the tree for one more month.

@ulrfa

I assume sendfile(2) only help when transferring data directly between socket and file (i.e. not when compressing on the fly, nor with SSL/TLS.)

Exactly. As soon as you want to do anything more advanced (compression and TLS as you stated, but also on-the-fly checksum validation), then it can no longer be used.

But it consumes less CPU with HTTP than gRPC, and lower CPU usage gives more headroom, e.g. to compress blobs.

With other features such as TLS and compression enabled, the difference in performance between HTTP and gRPC ought to be negligible.

It would have been nice if Buildbarn could have a plugin system for separate BlobAccess implementations that do not fulfill the requirements to be officially supported. E.g. stored in a separate repository like bb-event-service so that you don’t need to bother updating them when you do changes to BlobAccess. But I don’t know how to architect such plugin system, or if it would make sense from your perspective?

Go doesn't provide a lot of facilities for that. There's the plugin package, but one of the restrictions it has is this:

Together, these restrictions mean that, in practice, the application and its plugins must all be built together by a single person or component of a system. In that case, it may be simpler for that person or component to generate Go source files that blank-import the desired set of plugins and then compile a static executable in the usual way.

So that doesn't help.

EdSchouten commented 11 months ago

Redis and SizeDistinguishing have just been removed from the tree. As discussed above, HTTP will be removed on 2023-11-01.

buildbarn / bb-storage

Remove deprecated storage backends #176

@minor-fixes

@mostynb

@ulrfa