goauthentik / authentik

The authentication glue you need.
https://goauthentik.io
Other
13.58k stars 908 forks source link

Renewing detected SSL certificates leads to fatal server crash #9907

Closed TNorthover closed 5 months ago

TNorthover commented 5 months ago

Describe the bug Renewing certificates outside authentik leads to fatal error: concurrent map writes and authentik-server exits with the backtrace below, under the logs heading.

To Reproduce I'm pretty sure it's a race condition so reproduction is going to be a bit hit-or-miss. I think the key is that I have an out-of-band cert-renewer running that changes a certificate's fingerprint, and that makes a race significantly more likely. But loosely, this is what I did:

  1. Issue cert+key into /certs/.
  2. Wait for it to be picked up by worker.
  3. Set it as the certificate for the default brand.
  4. Renew the certificate, changing the fingerprint.
  5. Wait for crash. It's happened twice for me, but about 6 hours after the cert change.

Expected behavior No crashing.

Logs

I think the critical bit is here, shout and I can post more (thousands of idle functions in the backtrace, for example)

May 29 11:21:31 authentik authentik-server[1061776]: fatal error: concurrent map writes
May 29 11:21:31 authentik authentik-server[1061776]: 
May 29 11:21:31 authentik authentik-server[1061776]: goroutine 23997 [running]:
May 29 11:21:31 authentik authentik-server[1061776]: goauthentik.io/internal/outpost/ak.(*CryptoStore).Fetch(0xc0002d6340, {0xc0001e04b0, 0x24})
May 29 11:21:31 authentik authentik-server[1061776]:         /go/src/goauthentik.io/internal/outpost/ak/crypto.go:75 +0x5a5
May 29 11:21:31 authentik authentik-server[1061776]: goauthentik.io/internal/outpost/ak.(*CryptoStore).Get(0xc0002d6340, {0xc0001e04b0, 0x24})
May 29 11:21:31 authentik authentik-server[1061776]:         /go/src/goauthentik.io/internal/outpost/ak/crypto.go:80 +0x3a
May 29 11:21:31 authentik authentik-server[1061776]: goauthentik.io/internal/web/brand_tls.(*Watcher).GetCertificate(0xc000208080, 0xc0008f68f0)
May 29 11:21:31 authentik authentik-server[1061776]:         /go/src/goauthentik.io/internal/web/brand_tls/brand_tls.go:81 +0x1f0
May 29 11:21:31 authentik authentik-server[1061776]: goauthentik.io/internal/web.(*WebServer).GetCertificate.func1(0xc0008f68f0?)
May 29 11:21:31 authentik authentik-server[1061776]:         /go/src/goauthentik.io/internal/web/tls.go:31 +0x79
May 29 11:21:31 authentik authentik-server[1061776]: crypto/tls.(*Config).getCertificate(0xc00016e680, 0xc0008f68f0)
May 29 11:21:31 authentik authentik-server[1061776]:         /usr/local/go/src/crypto/tls/common.go:1126 +0x4a
May 29 11:21:31 authentik authentik-server[1061776]: crypto/tls.(*serverHandshakeState).processClientHello(0xc0008f6820)
May 29 11:21:31 authentik authentik-server[1061776]:         /usr/local/go/src/crypto/tls/handshake_server.go:236 +0x7e5
May 29 11:21:31 authentik authentik-server[1061776]: crypto/tls.(*serverHandshakeState).handshake(0xc0008f6820)
May 29 11:21:31 authentik authentik-server[1061776]:         /usr/local/go/src/crypto/tls/handshake_server.go:67 +0x2c
May 29 11:21:31 authentik authentik-server[1061776]: crypto/tls.(*Conn).serverHandshake(0xc000b7ca88, {0x1649f18, 0xc0007ee050})
May 29 11:21:31 authentik authentik-server[1061776]:         /usr/local/go/src/crypto/tls/handshake_server.go:61 +0x111
May 29 11:21:31 authentik authentik-server[1061776]: crypto/tls.(*Conn).handshakeContext(0xc000b7ca88, {0x1649ee0, 0xc000911d40})
May 29 11:21:31 authentik authentik-server[1061776]:         /usr/local/go/src/crypto/tls/conn.go:1553 +0x3cb
May 29 11:21:31 authentik authentik-server[1061776]: crypto/tls.(*Conn).HandshakeContext(...)
May 29 11:21:31 authentik authentik-server[1061776]:         /usr/local/go/src/crypto/tls/conn.go:1493
May 29 11:21:31 authentik authentik-server[1061776]: net/http.(*conn).serve(0xc000e1c2d0, {0x1649ee0, 0xc0003269f0})
May 29 11:21:31 authentik authentik-server[1061776]:         /usr/local/go/src/net/http/server.go:1921 +0xe85
May 29 11:21:31 authentik authentik-server[1061776]: created by net/http.(*Server).Serve in goroutine 25
May 29 11:21:31 authentik authentik-server[1061776]:         /usr/local/go/src/net/http/server.go:3285 +0x4b4

above, there are ~5 near-simultaneous calls of this Fetch API every minute in the log (filtered below):

[...]
May 29 11:20:01 authentik authentik-server[1061776]: {"event":"Fetching certificate and private key","level":"info","logger":"authentik.outpost.cryptostore","timestamp":"2024-05-29T11:20:01Z","uuid":"610c52ce-a152-4abb-895a-67c52f12c8ad"}
May 29 11:20:01 authentik authentik-server[1061776]: {"event":"Fetching certificate and private key","level":"info","logger":"authentik.outpost.cryptostore","timestamp":"2024-05-29T11:20:01Z","uuid":"610c52ce-a152-4abb-895a-67c52f12c8ad"}
May 29 11:20:01 authentik authentik-server[1061776]: {"event":"Fetching certificate and private key","level":"info","logger":"authentik.outpost.cryptostore","timestamp":"2024-05-29T11:20:01Z","uuid":"610c52ce-a152-4abb-895a-67c52f12c8ad"}
May 29 11:20:01 authentik authentik-server[1061776]: {"event":"Fetching certificate and private key","level":"info","logger":"authentik.outpost.cryptostore","timestamp":"2024-05-29T11:20:01Z","uuid":"610c52ce-a152-4abb-895a-67c52f12c8ad"}
May 29 11:20:01 authentik authentik-server[1061776]: {"event":"Fetching certificate and private key","level":"info","logger":"authentik.outpost.cryptostore","timestamp":"2024-05-29T11:20:01Z","uuid":"610c52ce-a152-4abb-895a-67c52f12c8ad"}
May 29 11:21:31 authentik authentik-server[1061776]: {"event":"Fetching certificate and private key","level":"info","logger":"authentik.outpost.cryptostore","timestamp":"2024-05-29T11:21:31Z","uuid":"610c52ce-a152-4abb-895a-67c52f12c8ad"}
May 29 11:21:31 authentik authentik-server[1061776]: {"event":"Fetching certificate and private key","level":"info","logger":"authentik.outpost.cryptostore","timestamp":"2024-05-29T11:21:31Z","uuid":"610c52ce-a152-
4abb-895a-67c52f12c8ad"}
May 29 11:21:31 authentik authentik-server[1061776]: {"event":"Fetching certificate and private key","level":"info","logger":"authentik.outpost.cryptostore","timestamp":"2024-05-29T11:21:31Z","uuid":"610c52ce-a152-
4abb-895a-67c52f12c8ad"}
May 29 11:21:31 authentik authentik-server[1061776]: {"event":"Fetching certificate and private key","level":"info","logger":"authentik.outpost.cryptostore","timestamp":"2024-05-29T11:21:31Z","uuid":"610c52ce-a152-
4abb-895a-67c52f12c8ad"}
May 29 11:21:31 authentik authentik-server[1061776]: {"event":"Fetching certificate and private key","level":"info","logger":"authentik.outpost.cryptostore","timestamp":"2024-05-29T11:21:31Z","uuid":"610c52ce-a152-
4abb-895a-67c52f12c8ad"}

so with ~5 concurrent calls per minute, it takes about 6 hours to crash on my machine.

Version and Deployment (please complete the following information):

Additional context

I think what's happening is that renewing the certificate invalidates its fingerprint. This means that whenever the Get function in internal/outpost/ak/crypto.go is called, Fetch goes down the slow path, and at the end of that slow path is the concurrent map write that crashes: cs.certificates[uuid] = &x509cert.

I suspect the probability of a race error could be drastically reduced by updating fingerprints when we update the certificates (at the moment only the AddKeypair function ever does that).

But it wouldn't eliminate the problem (I think). First time this API gets called after the renewal there might well still be a small chance of it racing, until the fingerprint is updated.

Moradiii1987 commented 5 months ago

Saeed

Moradiii1987 commented 4 months ago

As