caddyserver / caddy

Fast and extensible multi-platform HTTP/1-2-3 web server with automatic HTTPS
https://caddyserver.com
Apache License 2.0
57.34k stars 4k forks source link

changes healthcheck to block on provision until first success (and also block properly in general) #6407

Closed elee1766 closed 3 months ago

elee1766 commented 3 months ago

https://github.com/caddyserver/caddy/blob/master/caddytest/integration/reverseproxy_test.go#L359 there is a todo here: TODO: for some reason this test seems particularly flaky, getting 503 when it should be 200, unless we wait

the reason is because health check hasn't succeeded, so there needs to be a wait to wait until the test starts.

the health checkers are started in goroutine, https://github.com/caddyserver/caddy/blob/master/modules/caddyhttp/reverseproxy/healthchecks.go#L254-L264

and the health checker start includes the initial request. the issues are twofold.

  1. the goroutine returns before the first round of h.doActiveHealthCheckForAllHosts
  2. doActiveHealthCheckForAllHosts will return before any health checkers finish.

so the fixes are to

  1. make sure provision doesn't return until activeHealthChecker has started, and make the goroutine start within activeHealthChecker
  2. make sure that doActiveHealthCheckForAllHosts doesn't return until all checkers have finished.

of course, this "fix" also makes it so that if a health check hangs - caddy will stay in provisioning state until it times out. i think this is the correct behavior (caddy should fail health check until it has health checked all reverse proxies at least once), but maybe others disagree?

this change has the additional advantage of making it more difficult for configuration errors to stampede a remote

maybe there though needs to be better timeout guarantees on the underlying healthchecks though before this can be pushed? perhaps some people are relying on this weird behavior as well, i am not sure.

elee1766 commented 3 months ago

closing because we will likely go down a different route. #6410