TraceMachina / nativelink

NativeLink is an open source high-performance build cache and remote execution server, compatible with Bazel, Buck2, Reclient, and other RBE-compatible build systems. It offers drastically faster builds, reduced test flakiness, and specialized hardware.
https://nativelink.com
Apache License 2.0
1.12k stars 102 forks source link

Implement Google Cloud Bucket Store #659

Open steedmicro opened 6 months ago

steedmicro commented 6 months ago

We need to implement Google Cloud Bucket Store. We can refer Amazon S3 Bucket Store code - nativelink-store/src/s3_store.rs as a reference which implements similar logic we are going to implement. Unit tests and example config files need to be added.

aaronmondal commented 6 months ago

IIRC when @allada and I talked about this a while ago we came to the conclusion that something like this might make sense:

  1. Migrate the current aws sdk to a 1.x version as the one we currently use is outdated
  2. Generalize the S3 store to something like a blobstore, i.e. a wrapper store where we can plug in different highly specialized/tuned backend implementations for AWS, GCP, Azure etc.
  3. Implement the actual GCP store and potentially implement one for Azure as well

One major hurdle I can currently see is that these stores can be tricky to test. Ideally we'll want some sort of short lived buckets for integration tests at some point. Technically I'd have a setup for CubeFS in an ad-hoc K8s cluster available, but while that might be useful for some API tests, that setup most likely can't mirror the actual behavior of AWS, GCP etc with respect to things like HTTP/HTTP2 and other not-so-obvious pitfalls.

@steed924 Consider what i mentioned here as a sort of 'wishful thinking out loud' rather than requirements - the current S3 store is not exactly in optimal shape, so anything in this area will probably be better than the current implementation 😅

steedmicro commented 6 months ago

Thanks. I will keep in mind that. +@aaronmondal.

steedmicro commented 6 months ago

FYI, while I was working on this issue, I encountered one problem while I was tryig to evaluate nativelink running with S3 bucket. I was working on Ubuntu VPS and set aws credentials correctly and set environment variables for them also. Even after I've tried run the command after bazel clean --expunge the result was the same.

Running the nativelink by bazel run nativelink -- $(pwd)/nativelink-config/examples/s3_backend_with_local_fast_cas.json seems to run server correctly.

But when I was going to evaluate the native link running by this command bazel test //... --remote_instance_name=main --remote_cache=grpc://127.0.0.1:50051 --remote_executor=grpc://127.0.0.1:50051 --remote_default_exec_properties=cpu_count=1, it raises errors like this.

It raises an error while it tries to load the first chunk from ActionCache server, which is supported by S3 bucket.

I hope to hear your opinions. @aaronmondal. Thanks.

Capture

steedmicro commented 6 months ago

FYI, I've fixed the bug by myself. The reason was because S3 bucket I was using was on us-east-1 region. After I've updated the AWS_ENDPOINT_URL and AWS_DEFAULT_REGION environment varibles to us-east-1, it seemed to be working fine. Thanks.