caddyserver / dist

Resources for packaging and distributing Caddy
Apache License 2.0
119 stars 118 forks source link

Automatically restart Caddy on failure #102

Closed karimfromjordan closed 1 year ago

karimfromjordan commented 1 year ago

I just had Caddy crash on one of my servers for the first time. Here are the logs:

Aug 04 08:12:31 caddy[712]: panic: runtime error: invalid memory address or nil pointer dereference
Aug 04 08:12:31 caddy[712]: [signal SIGSEGV: segmentation violation code=0x1 addr=0x30 pc=0x903750]
Aug 04 08:12:31 caddy[712]: goroutine 15427 [running]:
Aug 04 08:12:31 caddy[712]: github.com/caddyserver/certmagic.(*Config).getCertDuringHandshake(0xc0001d8680, {0x1f09a88, 0xc000138008}, _, _)
Aug 04 08:12:31 caddy[712]:         github.com/caddyserver/certmagic@v0.19.1/handshake.go:378 +0x1390
Aug 04 08:12:31 caddy[712]: github.com/caddyserver/certmagic.(*Config).GetCertificateWithContext(0xc0001d8680, {0x1f09a88, 0xc000138008}, 0xc0001>
Aug 04 08:12:31 caddy[712]:         github.com/caddyserver/certmagic@v0.19.1/handshake.go:84 +0xbff
Aug 04 08:12:31 caddy[712]: github.com/caddyserver/certmagic.(*Config).GetCertificate(0xc00040a000?, 0xc00075c120?)
Aug 04 08:12:31 caddy[712]:         github.com/caddyserver/certmagic@v0.19.1/handshake.go:50 +0x2a
Aug 04 08:12:31 caddy[712]: github.com/caddyserver/caddy/v2/modules/caddytls.(*ConnectionPolicy).buildStandardTLSConfig.func1(0xc0001d85b0)
Aug 04 08:12:31 caddy[712]:         github.com/caddyserver/caddy/v2@v2.7.2/modules/caddytls/connpolicy.go:232 +0x14f
Aug 04 08:12:31 caddy[712]: github.com/quic-go/qtls-go1-20.(*config).getCertificate(0xc00057f380, 0xc0001d85b0)
Aug 04 08:12:31 caddy[712]:         github.com/quic-go/qtls-go1-20@v0.3.0/common.go:1086 +0x42
Aug 04 08:12:31 caddy[712]: github.com/quic-go/qtls-go1-20.(*serverHandshakeStateTLS13).pickCertificate(0xc000399be8)
Aug 04 08:12:31 caddy[712]:         github.com/quic-go/qtls-go1-20@v0.3.0/handshake_server_tls13.go:415 +0x66
Aug 04 08:12:31 caddy[712]: github.com/quic-go/qtls-go1-20.(*serverHandshakeStateTLS13).handshake(0xc000399be8)
Aug 04 08:12:31 caddy[712]:         github.com/quic-go/qtls-go1-20@v0.3.0/handshake_server_tls13.go:60 +0x53
Aug 04 08:12:31 caddy[712]: github.com/quic-go/qtls-go1-20.(*Conn).serverHandshake(0xc00003c000, {0x1f09a50, 0xc0008c2320})
Aug 04 08:12:31 caddy[712]:         github.com/quic-go/qtls-go1-20@v0.3.0/handshake_server.go:53 +0x188
Aug 04 08:12:31 caddy[712]: github.com/quic-go/qtls-go1-20.(*Conn).handshakeContext(0xc00003c000, {0x1f09af8, 0xc000758930})
Aug 04 08:12:31 caddy[712]:         github.com/quic-go/qtls-go1-20@v0.3.0/conn.go:1540 +0x3ce
Aug 04 08:12:31 caddy[712]: github.com/quic-go/qtls-go1-20.(*Conn).HandshakeContext(0xc0000de7d0?, {0x1f09af8?, 0xc000758930?})
Aug 04 08:12:31 caddy[712]:         github.com/quic-go/qtls-go1-20@v0.3.0/conn.go:1480 +0x25
Aug 04 08:12:31 caddy[712]: created by github.com/quic-go/qtls-go1-20.(*QUICConn).Start
Aug 04 08:12:31 caddy[712]:         github.com/quic-go/qtls-go1-20@v0.3.0/quic.go:179 +0xcf
Aug 04 08:12:31 systemd[1]: caddy.service: Main process exited, code=exited, status=2/INVALIDARGUMENT
Aug 04 08:12:31 systemd[1]: caddy.service: Failed with result 'exit-code'.

In order to get my apps online again I had to manually run systemctl restart caddy.service. I noticed that Caddy's service units don't instruct systemd to restart Caddy in case it crashes so I propose to make the following changes to the default systemd units:

[Unit]
StartLimitBurst=5
StartLimitIntervalSec=60
...

[Service]
Restart=on-failure
...

I do this for all of my apps in case they ever crash. It will restart them automatically up to 5 times (StartLimitBurst) within 60 seconds (StartLimitIntervalSec). If they reach the limit they transition into the failed state.

emilylange commented 1 year ago

See #92 and https://github.com/caddyserver/website/pull/284

francislavoie commented 1 year ago

Yep, that's already in the docs. https://caddyserver.com/docs/running#overrides

Regarding the panic, see https://github.com/caddyserver/caddy/issues/5680, we're working on it.

mholt commented 1 year ago

(These kinds of bugs are extremely rare btw)