cockroachdb / cockroach

CockroachDB - the open source, cloud-native distributed SQL database.
https://www.cockroachlabs.com
Other
29.52k stars 3.7k forks source link

*: avoid long non-preemptable function calls #115192

Open dt opened 7 months ago

dt commented 7 months ago

It appears that due to https://github.com/golang/go/issues/64417, we sometimes see long gc pause times and traces show STW pauses overlapping with block hashing. This has been observed for example during backups to s3, wherein the s3 sdk does both md5 and sha256 hashes of the 8mb chunks it buffered before uploading, but could affect other users of the hashing libraries as well.

We may want to consider patching the hashing functions on our go fork until the issue is resolved upstream.

Jira issue: CRDB-33925

blathers-crl[bot] commented 7 months ago

cc @cockroachdb/disaster-recovery

sumeerbhola commented 7 months ago

wherein the s3 sdk does both md5 and sha256 hashes of the 8mb chunks it buffered before uploading

@dt do you know how long (roughly) the non-preemptable intervals are? < 10ms?

dt commented 7 months ago

how long (roughly) the non-preemptable intervals are? < 10ms?

typically around or under 10ms but some traces indicated sometimes as much as 40ms

nvanbenschoten commented 6 months ago

@dt @sumeerbhola what is our current plan with this? Who is owning its resolution? We have at least one customer who is actively waiting on improvements here, now that we've identified what those improvements should be.

sumeerbhola commented 6 months ago

I don't think this should have the AC label, since this is about work that is not preemptible by the goroutine scheduler.

There is a secondary issue that preemptible work should request admission, via the elastic CPU tokens mechanism -- that is already tracked in https://github.com/cockroachdb/cockroach/issues/107770

dt commented 6 months ago

what is our current plan with this

On the DR side, we've determined that we don't have the option to simply avoid hashing -- turning off md5'ing in s3 uploads breaks when bucket versioning locking is enabled -- and the minimum buffer size we can pass to s3 to hash and upload is 5MB (the default is 8 in 23.1, and will be 5MB in 23.2).

work that is not preemptible by the goroutine scheduler.

I think this issue is tracking making some classes of work that is not preemptable actually be preemptable, i.e. patching our fork of the stdlib so that the two mentioned hashing functions become preemptible (by the scheduler) work. I believe AC is already maintaining our patched fork of go (for the grunning patches?).

preemptible work should request admission, via the elastic CPU tokens mechanism

This is on DR's backlog, to integrate AC's existing libraries for requesting tokens/pacing into the the backup process as it constructs SSTs and writes them into the various cloud SDK uploaders, but that won't help us here: we can write as slowly as we want but the uploaders will buffer what we write into a chunk before hashing it and uploading it, and the minimum chunk size is 5MB, so that's the unit that'll get passed to hashing at once.

nicktrav commented 6 months ago

Shall we add a T- label here, so this doesn't fall through the cracks?

blathers-crl[bot] commented 6 months ago

cc @cockroachdb/disaster-recovery

nicktrav commented 5 months ago

Here are some results comparing BenchmarkLatencyWhileHashing between builds on master @ cc4fdffa853. The results look promising.

https://gist.github.com/nicktrav/e1544fb1dc2d1bc6d04a1dd9db52e670

The RHS updates our runtime patch to include the following:

diff --git a/src/crypto/md5/md5.go b/src/crypto/md5/md5.go
index ccee4ea3a9..e15bc6d6d6 100644
--- a/src/crypto/md5/md5.go
+++ b/src/crypto/md5/md5.go
@@ -27,6 +27,10 @@ const Size = 16
 // The blocksize of MD5 in bytes.
 const BlockSize = 64

+// The maximum number of bytes that can be passed to block.
+const maxAsmIters = 1024
+const maxAsmSize = BlockSize * maxAsmIters // 64KiB
+
 const (
        init0 = 0x67452301
        init1 = 0xEFCDAB89
@@ -130,6 +134,11 @@ func (d *digest) Write(p []byte) (nn int, err error) {
        if len(p) >= BlockSize {
                n := len(p) &^ (BlockSize - 1)
                if haveAsm {
+                       for n > maxAsmSize {
+                               block(d, p[:maxAsmSize])
+                               p = p[maxAsmSize:]
+                               n -= maxAsmSize
+                       }
                        block(d, p[:n])
                } else {
                        blockGeneric(d, p[:n])
diff --git a/src/crypto/md5/md5_test.go b/src/crypto/md5/md5_test.go
index 851e7fb10d..e120be3718 100644
--- a/src/crypto/md5/md5_test.go
+++ b/src/crypto/md5/md5_test.go
@@ -120,10 +120,11 @@ func TestGoldenMarshal(t *testing.T) {

 func TestLarge(t *testing.T) {
        const N = 10000
+       const offsets = 4
        ok := "2bb571599a4180e1d542f76904adc3df" // md5sum of "0123456789" * 1000
-       block := make([]byte, 10004)
+       block := make([]byte, N+offsets)
        c := New()
-       for offset := 0; offset < 4; offset++ {
+       for offset := 0; offset < offsets; offset++ {
                for i := 0; i < N; i++ {
                        block[offset+i] = '0' + byte(i%10)
                }
@@ -142,6 +143,31 @@ func TestLarge(t *testing.T) {
        }
 }

+func TestExtraLarge(t *testing.T) {
+       const N = 100000
+       const offsets = 4
+       ok := "13572e9e296cff52b79c52148313c3a5" // md5sum of "0123456789" * 10000
+       block := make([]byte, N+offsets)
+       c := New()
+       for offset := 0; offset < offsets; offset++ {
+               for i := 0; i < N; i++ {
+                       block[offset+i] = '0' + byte(i%10)
...skipping...
-       for offset := 0; offset < 4; offset++ {
+       for offset := 0; offset < offsets; offset++ {
                for i := 0; i < N; i++ {
                        block[offset+i] = '0' + byte(i%10)
                }
@@ -142,6 +143,31 @@ func TestLarge(t *testing.T) {
        }
 }

+func TestExtraLarge(t *testing.T) {
+       const N = 100000
+       const offsets = 4
+       ok := "13572e9e296cff52b79c52148313c3a5" // md5sum of "0123456789" * 10000
+       block := make([]byte, N+offsets)
+       c := New()
+       for offset := 0; offset < offsets; offset++ {
+               for i := 0; i < N; i++ {
+                       block[offset+i] = '0' + byte(i%10)
+               }
+               for blockSize := 10; blockSize <= N; blockSize *= 10 {
+                       blocks := N / blockSize
+                       b := block[offset : offset+blockSize]
+                       c.Reset()
+                       for i := 0; i < blocks; i++ {
+                               c.Write(b)
+                       }
+                       s := fmt.Sprintf("%x", c.Sum(nil))
+                       if s != ok {
+                               t.Fatalf("md5 TestExtraLarge offset=%d, blockSize=%d = %s want %s", offset, blockSize, s, ok)
+                       }
+               }
+       }
+}
+
 // Tests that blockGeneric (pure Go) and block (in assembly for amd64, 386, arm) match.
 func TestBlockGeneric(t *testing.T) {
        gen, asm := New().(*digest), New().(*digest)
diff --git a/src/crypto/sha256/sha256.go b/src/crypto/sha256/sha256.go
index 2deafbc9fc..567a3c81f9 100644
--- a/src/crypto/sha256/sha256.go
+++ b/src/crypto/sha256/sha256.go
@@ -28,6 +28,10 @@ const Size224 = 28
 // The blocksize of SHA256 and SHA224 in bytes.
 const BlockSize = 64

+// The maximum number of bytes that can be passed to block.
+const maxAsmIters = 1024
+const maxAsmSize = BlockSize * maxAsmIters // 64KiB
+
 const (
        chunk     = 64
        init0     = 0x6A09E667
@@ -191,6 +195,11 @@ func (d *digest) Write(p []byte) (nn int, err error) {
        }
        if len(p) >= chunk {
                n := len(p) &^ (chunk - 1)
+               for n > maxAsmSize {
+                       block(d, p[:maxAsmSize])
+                       p = p[maxAsmSize:]
+                       n -= maxAsmSize
+               }
                block(d, p[:n])
                p = p[n:]
        }
blathers-crl[bot] commented 4 months ago

cc @cockroachdb/disaster-recovery

rharding6373 commented 4 months ago

Nick is OOO, so assigning to @sumeerbhola so it doesn't get lost.