Open francislavoie opened 7 years ago
We could have Caddy return a different error codes for pre-start errors.
That would be a good start. :smile:
Anyone know how to change the exit code of log.Fatal
? :confused:
Eh, nevermind: quitting before the servers are fully started is most common I think (especially for unexpected stops, when the exit code is of interest), so I'll leave that a code of 1. The only other place I can find where Caddy exits (either with log.Fatal*
or os.Exit
) is when processing signals. I can change those to values > 1 easily enough. Will that help?
I'm finally getting around to trying to implement this. This is my current revision of the docker-entrypoint.sh
script.
#!/bin/ash
# The file that tracks if this container has been started with errors
errorfile="/root/.caddy/haserror"
# Implement a reset command to remove the errorfile
if [ "$1" = 'reset' ]; then
rm -f "$errorfile"
exit 0
fi
# Caddy is being run normally
if [ "$1" = '/usr/bin/caddy' ] || [ "$1" = 'caddy' ]; then
# Quit early if the error file exists
# We do this so that we don't hit Let's Encrypt rate limits
# if we have startup errors
if [ -e "$errorfile" ]; then
echo "Exiting, we previously ran with an error."
exit 1
fi
# Run caddy
"$@"
# Get caddy exit code
cmdoutput="$?"
# Error 1 is a startup error, other exit codes don't concern us
# See https://caddyserver.com/docs/cli#exit-codes
if [ $cmdoutput -eq 1 ]; then
echo "Exited with error $cmdoutput."
touch "$errorfile"
fi
# Exit with the same status code as caddy did
exit "$cmdoutput"
fi
# If the command is not caddy, we'll just run it normally
exec "$@"
So when this is run, the idea is that it will keep restarting if a restart policy is specified with exponential backoff (I think 1.5s -> 3s -> 6s -> 12s and so on), but this script will make all subsequent restarts exit early until the user gets rid of the error file. I set it up with a built-in reset command that removes the file, used like this:
docker-compose run --rm --no-deps caddy reset
I have the /root/.caddy
folder as a volume, so it gets shared between containers, so running a one-off instance has access to the same volume and it removes that file.
I'm considering adding some logic that checks the file modification time with stat -c %Y $errorfile
to check if it's been there for over something like 30 minutes where it should attempt to start normally again... some amount of time where it won't hit the rate limit anymore but stop being an encumbrance to users if they come back to it later. I'm not sure if it's actually beneficial or not.
This entire thing is a stupid hack for something that I think should be supported by Docker anyways.
I want to use
restart: unless-stopped
on my containers, but I realized that doing so with Caddy would be a bad idea on first deploy in the case that DNS/ports/etc are incorrect, because auto-TLS will fail. If Caddy exits because of TLS errors, the restart policy starts it right back up again and that means a rate limit will quickly be hit.I'm not totally sure how to fix this (which is why I'm opening an issue, maybe you have a good idea for this @abiosoft), but this is my theory - I think we should have a bash script as the entrypoint, which can write to a file when Caddy exits with an error before doing the exit itself. If it boots back up, the entrypoint script should check the existence of that file and do an early cancel or something, to prevent it from trying TLS again and likely failing again.
Basically the idea is just doing an extra layer to the exponential back-off that the docker restart policies have to be safer from hitting rate limits.
I don't think there's different error codes for categories of errors in Caddy, having it return different codes than just 1 might be good to help differentiate for something like this.
Does this make sense?