Describe the bug
Renewing certificates outside authentik leads to fatal error: concurrent map writes and authentik-server exits with the backtrace below, under the logs heading.
To Reproduce
I'm pretty sure it's a race condition so reproduction is going to be a bit hit-or-miss. I think the key is that I have an out-of-band cert-renewer running that changes a certificate's fingerprint, and that makes a race significantly more likely. But loosely, this is what I did:
Issue cert+key into /certs/.
Wait for it to be picked up by worker.
Set it as the certificate for the default brand.
Renew the certificate, changing the fingerprint.
Wait for crash. It's happened twice for me, but about 6 hours after the cert change.
Expected behavior
No crashing.
Logs
I think the critical bit is here, shout and I can post more (thousands of idle functions in the backtrace, for example)
May 29 11:21:31 authentik authentik-server[1061776]: fatal error: concurrent map writes
May 29 11:21:31 authentik authentik-server[1061776]:
May 29 11:21:31 authentik authentik-server[1061776]: goroutine 23997 [running]:
May 29 11:21:31 authentik authentik-server[1061776]: goauthentik.io/internal/outpost/ak.(*CryptoStore).Fetch(0xc0002d6340, {0xc0001e04b0, 0x24})
May 29 11:21:31 authentik authentik-server[1061776]: /go/src/goauthentik.io/internal/outpost/ak/crypto.go:75 +0x5a5
May 29 11:21:31 authentik authentik-server[1061776]: goauthentik.io/internal/outpost/ak.(*CryptoStore).Get(0xc0002d6340, {0xc0001e04b0, 0x24})
May 29 11:21:31 authentik authentik-server[1061776]: /go/src/goauthentik.io/internal/outpost/ak/crypto.go:80 +0x3a
May 29 11:21:31 authentik authentik-server[1061776]: goauthentik.io/internal/web/brand_tls.(*Watcher).GetCertificate(0xc000208080, 0xc0008f68f0)
May 29 11:21:31 authentik authentik-server[1061776]: /go/src/goauthentik.io/internal/web/brand_tls/brand_tls.go:81 +0x1f0
May 29 11:21:31 authentik authentik-server[1061776]: goauthentik.io/internal/web.(*WebServer).GetCertificate.func1(0xc0008f68f0?)
May 29 11:21:31 authentik authentik-server[1061776]: /go/src/goauthentik.io/internal/web/tls.go:31 +0x79
May 29 11:21:31 authentik authentik-server[1061776]: crypto/tls.(*Config).getCertificate(0xc00016e680, 0xc0008f68f0)
May 29 11:21:31 authentik authentik-server[1061776]: /usr/local/go/src/crypto/tls/common.go:1126 +0x4a
May 29 11:21:31 authentik authentik-server[1061776]: crypto/tls.(*serverHandshakeState).processClientHello(0xc0008f6820)
May 29 11:21:31 authentik authentik-server[1061776]: /usr/local/go/src/crypto/tls/handshake_server.go:236 +0x7e5
May 29 11:21:31 authentik authentik-server[1061776]: crypto/tls.(*serverHandshakeState).handshake(0xc0008f6820)
May 29 11:21:31 authentik authentik-server[1061776]: /usr/local/go/src/crypto/tls/handshake_server.go:67 +0x2c
May 29 11:21:31 authentik authentik-server[1061776]: crypto/tls.(*Conn).serverHandshake(0xc000b7ca88, {0x1649f18, 0xc0007ee050})
May 29 11:21:31 authentik authentik-server[1061776]: /usr/local/go/src/crypto/tls/handshake_server.go:61 +0x111
May 29 11:21:31 authentik authentik-server[1061776]: crypto/tls.(*Conn).handshakeContext(0xc000b7ca88, {0x1649ee0, 0xc000911d40})
May 29 11:21:31 authentik authentik-server[1061776]: /usr/local/go/src/crypto/tls/conn.go:1553 +0x3cb
May 29 11:21:31 authentik authentik-server[1061776]: crypto/tls.(*Conn).HandshakeContext(...)
May 29 11:21:31 authentik authentik-server[1061776]: /usr/local/go/src/crypto/tls/conn.go:1493
May 29 11:21:31 authentik authentik-server[1061776]: net/http.(*conn).serve(0xc000e1c2d0, {0x1649ee0, 0xc0003269f0})
May 29 11:21:31 authentik authentik-server[1061776]: /usr/local/go/src/net/http/server.go:1921 +0xe85
May 29 11:21:31 authentik authentik-server[1061776]: created by net/http.(*Server).Serve in goroutine 25
May 29 11:21:31 authentik authentik-server[1061776]: /usr/local/go/src/net/http/server.go:3285 +0x4b4
above, there are ~5 near-simultaneous calls of this Fetch API every minute in the log (filtered below):
[...]
May 29 11:20:01 authentik authentik-server[1061776]: {"event":"Fetching certificate and private key","level":"info","logger":"authentik.outpost.cryptostore","timestamp":"2024-05-29T11:20:01Z","uuid":"610c52ce-a152-4abb-895a-67c52f12c8ad"}
May 29 11:20:01 authentik authentik-server[1061776]: {"event":"Fetching certificate and private key","level":"info","logger":"authentik.outpost.cryptostore","timestamp":"2024-05-29T11:20:01Z","uuid":"610c52ce-a152-4abb-895a-67c52f12c8ad"}
May 29 11:20:01 authentik authentik-server[1061776]: {"event":"Fetching certificate and private key","level":"info","logger":"authentik.outpost.cryptostore","timestamp":"2024-05-29T11:20:01Z","uuid":"610c52ce-a152-4abb-895a-67c52f12c8ad"}
May 29 11:20:01 authentik authentik-server[1061776]: {"event":"Fetching certificate and private key","level":"info","logger":"authentik.outpost.cryptostore","timestamp":"2024-05-29T11:20:01Z","uuid":"610c52ce-a152-4abb-895a-67c52f12c8ad"}
May 29 11:20:01 authentik authentik-server[1061776]: {"event":"Fetching certificate and private key","level":"info","logger":"authentik.outpost.cryptostore","timestamp":"2024-05-29T11:20:01Z","uuid":"610c52ce-a152-4abb-895a-67c52f12c8ad"}
May 29 11:21:31 authentik authentik-server[1061776]: {"event":"Fetching certificate and private key","level":"info","logger":"authentik.outpost.cryptostore","timestamp":"2024-05-29T11:21:31Z","uuid":"610c52ce-a152-4abb-895a-67c52f12c8ad"}
May 29 11:21:31 authentik authentik-server[1061776]: {"event":"Fetching certificate and private key","level":"info","logger":"authentik.outpost.cryptostore","timestamp":"2024-05-29T11:21:31Z","uuid":"610c52ce-a152-
4abb-895a-67c52f12c8ad"}
May 29 11:21:31 authentik authentik-server[1061776]: {"event":"Fetching certificate and private key","level":"info","logger":"authentik.outpost.cryptostore","timestamp":"2024-05-29T11:21:31Z","uuid":"610c52ce-a152-
4abb-895a-67c52f12c8ad"}
May 29 11:21:31 authentik authentik-server[1061776]: {"event":"Fetching certificate and private key","level":"info","logger":"authentik.outpost.cryptostore","timestamp":"2024-05-29T11:21:31Z","uuid":"610c52ce-a152-
4abb-895a-67c52f12c8ad"}
May 29 11:21:31 authentik authentik-server[1061776]: {"event":"Fetching certificate and private key","level":"info","logger":"authentik.outpost.cryptostore","timestamp":"2024-05-29T11:21:31Z","uuid":"610c52ce-a152-
4abb-895a-67c52f12c8ad"}
so with ~5 concurrent calls per minute, it takes about 6 hours to crash on my machine.
Version and Deployment (please complete the following information):
authentik version: 2024.4.2
Deployment: podman quadlets nicked from inspired by docker-compose deployment.
Additional context
I think what's happening is that renewing the certificate invalidates its fingerprint. This means that whenever the Get function in internal/outpost/ak/crypto.go is called, Fetch goes down the slow path, and at the end of that slow path is the concurrent map write that crashes: cs.certificates[uuid] = &x509cert.
I suspect the probability of a race error could be drastically reduced by updating fingerprints when we update the certificates (at the moment only the AddKeypair function ever does that).
But it wouldn't eliminate the problem (I think). First time this API gets called after the renewal there might well still be a small chance of it racing, until the fingerprint is updated.
Describe the bug Renewing certificates outside authentik leads to
fatal error: concurrent map writes
and authentik-server exits with the backtrace below, under the logs heading.To Reproduce I'm pretty sure it's a race condition so reproduction is going to be a bit hit-or-miss. I think the key is that I have an out-of-band cert-renewer running that changes a certificate's fingerprint, and that makes a race significantly more likely. But loosely, this is what I did:
Expected behavior No crashing.
Logs
I think the critical bit is here, shout and I can post more (thousands of idle functions in the backtrace, for example)
above, there are ~5 near-simultaneous calls of this
Fetch
API every minute in the log (filtered below):so with ~5 concurrent calls per minute, it takes about 6 hours to crash on my machine.
Version and Deployment (please complete the following information):
nicked frominspired by docker-compose deployment.Additional context
I think what's happening is that renewing the certificate invalidates its fingerprint. This means that whenever the
Get
function ininternal/outpost/ak/crypto.go
is called,Fetch
goes down the slow path, and at the end of that slow path is the concurrent map write that crashes:cs.certificates[uuid] = &x509cert
.I suspect the probability of a race error could be drastically reduced by updating
fingerprints
when we update thecertificates
(at the moment only theAddKeypair
function ever does that).But it wouldn't eliminate the problem (I think). First time this API gets called after the renewal there might well still be a small chance of it racing, until the fingerprint is updated.