Open dt opened 6 months ago
Hi @dt, please add branch-* labels to identify which branch(es) this C-bug affects.
:owl: Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.
@itsbilal one thing to note: we've been meaning to update gcs client library. Perhaps this could help. I have no idea though.
@dt when you ran this test, where was your client located?
Looking through recent changes to the GCS package, the direct connect optimizations may be relevant to DR and disaggregated storage. There appears to be a GRPC client and using it lets ops bypass one layer of proxies.
https://github.com/googleapis/google-cloud-go/pull/10859/files
I don't recall where/how I ran this originally. I just re-ran this from a MacBook with various vpn/traffic routing tools disabled from the nyc office using ./dev bench -v --stream-output --count 6 --bench-time=1x ./pkg/storage -f BenchmarkObjStorage --timeout 15m -- --test_env=COCKROACH_BENCHMARK_REMOTE_SSTS=1 | tee bench.txt
and got terrible (~3mb/s) from both.
Okay, I'll see if there's a way to run this on roachprod vms in the same cloud region as each of the buckets. That's the only fair way I can think to run this.
Being lazy I just ran it again on my laptop (and you could claim the is fair-ish since then it isn't directly in either's region, and is what, say, a node in a datacenter in Secaucus might expect if it backed up to s3/gcs in us-east). One thing I noticed that the test is slightly flawed in that it does not call rh.SetupForCompaction()
on the readHandle. This, combined with Copy()
passing a mere 256kb buffer, means that until readahead kicks in we're doing some pretty small reads. If I do my own "download" loop over ReadAt+discard with different sized buffers, side-by-side with objstorage.Copy(), we see that reading into a 4mb/8mn/16mb buffer to discard greatly enhances the throughput of both SDKs:
GCS/raw_buf=1.0_MiB/size=4.0_KiB 39.06KB/s S3/raw_buf=1.0_MiB/size=4.0_KiB 78.12KB/s
GCS/raw_buf=1.0_MiB/size=64_KiB 722.7KB/s S3/raw_buf=1.0_MiB/size=64_KiB 1.440MB/s
GCS/raw_buf=1.0_MiB/size=1.0_MiB 10.68MB/s S3/raw_buf=1.0_MiB/size=1.0_MiB 24.61MB/s
GCS/raw_buf=1.0_MiB/size=8.0_MiB 8.221MB/s S3/raw_buf=1.0_MiB/size=8.0_MiB 16.75MB/s
GCS/raw_buf=1.0_MiB/size=32_MiB 5.617MB/s S3/raw_buf=1.0_MiB/size=32_MiB 12.75MB/s
GCS/raw_buf=1.0_MiB/size=64_MiB 7.505MB/s S3/raw_buf=1.0_MiB/size=64_MiB 11.80MB/s
GCS/raw_buf=4.0_MiB/size=4.0_KiB 48.83KB/s S3/raw_buf=4.0_MiB/size=4.0_KiB 87.89KB/s
GCS/raw_buf=4.0_MiB/size=64_KiB 752.0KB/s S3/raw_buf=4.0_MiB/size=64_KiB 1.364MB/s
GCS/raw_buf=4.0_MiB/size=1.0_MiB 11.42MB/s S3/raw_buf=4.0_MiB/size=1.0_MiB 22.75MB/s
GCS/raw_buf=4.0_MiB/size=8.0_MiB 24.23MB/s S3/raw_buf=4.0_MiB/size=8.0_MiB 31.59MB/s
GCS/raw_buf=4.0_MiB/size=32_MiB 20.02MB/s S3/raw_buf=4.0_MiB/size=32_MiB 27.06MB/s
GCS/raw_buf=4.0_MiB/size=64_MiB 19.78MB/s S3/raw_buf=4.0_MiB/size=64_MiB 27.16MB/s
GCS/raw_buf=8.0_MiB/size=4.0_KiB 39.06KB/s S3/raw_buf=8.0_MiB/size=4.0_KiB 87.89KB/s
GCS/raw_buf=8.0_MiB/size=64_KiB 703.1KB/s S3/raw_buf=8.0_MiB/size=64_KiB 1.411MB/s
GCS/raw_buf=8.0_MiB/size=1.0_MiB 10.94MB/s S3/raw_buf=8.0_MiB/size=1.0_MiB 22.79MB/s
GCS/raw_buf=8.0_MiB/size=8.0_MiB 28.07MB/s S3/raw_buf=8.0_MiB/size=8.0_MiB 36.17MB/s
GCS/raw_buf=8.0_MiB/size=32_MiB 26.95MB/s S3/raw_buf=8.0_MiB/size=32_MiB 21.695MB/s
GCS/raw_buf=8.0_MiB/size=64_MiB 27.57MB/s S3/raw_buf=8.0_MiB/size=64_MiB 25.77MB/s
GCS/raw_buf=16_MiB/size=4.0_KiB 48.83KB/s S3/raw_buf=16_MiB/size=4.0_KiB 78.12KB/s
GCS/raw_buf=16_MiB/size=64_KiB 830.1KB/s S3/raw_buf=16_MiB/size=64_KiB 1.564MB/s
GCS/raw_buf=16_MiB/size=1.0_MiB 13.17MB/s S3/raw_buf=16_MiB/size=1.0_MiB 21.32MB/s
GCS/raw_buf=16_MiB/size=8.0_MiB 30.45MB/s S3/raw_buf=16_MiB/size=8.0_MiB 41.66MB/s
GCS/raw_buf=16_MiB/size=32_MiB 35.93MB/s S3/raw_buf=16_MiB/size=32_MiB 35.42MB/s
GCS/raw_buf=16_MiB/size=64_MiB 37.62MB/s S3/raw_buf=16_MiB/size=64_MiB 37.41MB/s
GCS/raw_buf=32_MiB/size=4.0_KiB 48.83KB/s S3/raw_buf=32_MiB/size=4.0_KiB 87.89KB/s
GCS/raw_buf=32_MiB/size=64_KiB 800.8KB/s S3/raw_buf=32_MiB/size=64_KiB 1.507MB/s
GCS/raw_buf=32_MiB/size=1.0_MiB 11.37MB/s S3/raw_buf=32_MiB/size=1.0_MiB 23.26MB/s
GCS/raw_buf=32_MiB/size=8.0_MiB 30.24MB/s S3/raw_buf=32_MiB/size=8.0_MiB 40.11MB/s
GCS/raw_buf=32_MiB/size=32_MiB 47.76MB/s S3/raw_buf=32_MiB/size=32_MiB 38.41MB/s
GCS/raw_buf=32_MiB/size=64_MiB 44.62MB/s S3/raw_buf=32_MiB/size=64_MiB 34.47MB/s
Adding a rh.SetupForCompaction()
call to the existing bench directly before objstorage.Copy
brings the results up to roughly equal around 25-27mb/s for the 32mb sst:
ObjStorageCopyGCS/objstorageCopy/size=4.0_KiB 9.766Ki ± ∞ ¹ ObjStorageCopyS3/objstorageCopy/size=4.0_KiB 9.766Ki ± ∞ ¹
ObjStorageCopyGCS/objstorageCopy/size=64_KiB 195.3Ki ± ∞ ¹ ObjStorageCopyS3/objstorageCopy/size=64_KiB 244.1Ki ± ∞ ¹
ObjStorageCopyGCS/objstorageCopy/size=1.0_MiB 2.737Mi ± ∞ ¹ ObjStorageCopyS3/objstorageCopy/size=1.0_MiB 3.357Mi ± ∞ ¹
ObjStorageCopyGCS/objstorageCopy/size=8.0_MiB 22.95Mi ± ∞ ¹ ObjStorageCopyS3/objstorageCopy/size=8.0_MiB 25.17Mi ± ∞ ¹
ObjStorageCopyGCS/objstorageCopy/size=32_MiB 27.68Mi ± ∞ ¹ ObjStorageCopyS3/objstorageCopy/size=32_MiB 29.42Mi ± ∞ ¹
ObjStorageCopyGCS/objstorageCopy/size=64_MiB 28.00Mi ± ∞ ¹ ObjStorageCopyS3/objstorageCopy/size=64_MiB 29.77Mi ± ∞ ¹
The new benchmark added in https://github.com/cockroachdb/cockroach/pull/124744 shows that we're much slower at reading bytes from a file on GCS than a file on S3. We should dig into our wrapper, the SDK client settings and the buffer sizes to figure out the GCS performance is so much worse than the s3 performance:
Jira issue: CRDB-39069
Epic CRDB-40359