golang / go

The Go programming language
https://go.dev
BSD 3-Clause "New" or "Revised" License
122.92k stars 17.52k forks source link

proposal: crypto/{sha256, md5, ...}: bound work in non-preemptible assembly loops #64417

Open nvanbenschoten opened 9 months ago

nvanbenschoten commented 9 months ago

Proposal Details

As discussed in https://github.com/golang/go/issues/24543#issuecomment-376581064, assembly code is non-preemptible. Non-preemptible goroutines can delay stop-the-world pauses (after all other goroutines have been paused), meaning that they can directly contribute to workload tail latency if timed poorly. This makes libraries which perform unbounded amounts of work in assembly loops risky for use in latency sensitive applications.

One way to mitigate this risk would be to bound the amount of work that any single call into an assembly routine performs on a case-by-case basis.

The crypto libraries provide a handful of examples where this technique could be applied. For example, in crypto/sha256, the following diff limits the non-preemptible section of a sha256 hash to somewhere around 250µs:

diff --git a/src/crypto/sha256/sha256.go b/src/crypto/sha256/sha256.go
index 2deafbc9fc..567a3c81f9 100644
--- a/src/crypto/sha256/sha256.go
+++ b/src/crypto/sha256/sha256.go
@@ -28,6 +28,10 @@ const Size224 = 28
 // The blocksize of SHA256 and SHA224 in bytes.
 const BlockSize = 64

+// The maximum number of iterations performed by a single assembly loop.
+const maxAsmIters = 1024
+const maxAsmSize = maxAsmIters * BlockSize // 64KiB
+
 const (
    chunk     = 64
    init0     = 0x6A09E667
@@ -191,6 +195,11 @@ func (d *digest) Write(p []byte) (nn int, err error) {
    }
    if len(p) >= chunk {
        n := len(p) &^ (chunk - 1)
+       for n > maxAsmSize {
+           block(d, p[:maxAsmSize])
+           p = p[maxAsmSize:]
+           n -= maxAsmSize
+       }
        block(d, p[:n])
        p = p[n:]
    }

We've seen in CockroachDB that this change combined with an almost identical one in crypto/md5 halves database p99 latency during backup uploads to AWS S3.

Alternative 1

One simple alternative is to document this risk and push the responsibility to bound the amount of non-preemptible to the users of these libraries.

However, that's not a great solution, as the direct callers of these hashing libraries are often buried deep inside of other libraries which applications cannot control (examples in the aws sdk: here, here, and here).

Alternative 2

The Safe-points everywhere for non-cooperative goroutine preemption design proposed a method to annotate safe preemption points in assembly code.

By default, the runtime cannot safely preempt assembly code since it won't know what registers contain pointers. As a follow-on to the work on safe-points everywhere, we should audit assembly in the standard library for non-preemptible loops and annotate them with register maps. In most cases this should be trivial since most assembly never constructs a pointer that isn't shadowed by an argument, so it can simply claim there are no pointers in registers. We should also document in the Go assembly guide how to do this for user code.

To my knowledge, this has never been implemented. If it was, it would provide an alternative to this proposal, though it would still require changes around each relevant use of assembly.


cc. @aclements you've thought about this problem more than anyone else. Does this seem like a reasonable proposal?

Jorropo commented 9 months ago

Your code could be simplified but it sounds reasonable. A third alternative could be to have assembly check some magic preemption boolean and return early if it is preempted. Something along the lines of:

func block(state *[8]uint32, buf []byte) (hashed uint)

// ...
    for h := p[:len(p) &^ (chunk - 1)]; len(h) >= chunk; h = h[block(d, h):] {}