canonical / pebble

Pebble is a lightweight Linux service manager with layered configuration and an HTTP API.
https://canonical-pebble.readthedocs-hosted.com/
GNU General Public License v3.0
146 stars 54 forks source link

Avoid acquiring state lock in Command.ServeHTTP #366

Closed benhoyt closed 1 month ago

benhoyt commented 7 months ago

For every request that comes in, when sending the response in Command.ServeHTTP, we acquire and release the state lock (twice!). We should avoid this if possible, at the very least for the GET /v1/health endpoint, which needs to return quickly and avoid global locks even when the system is under heavy load.

This came up in https://bugs.launchpad.net/juju/+bug/2052517, where a charm is using Pebble exec (which uses the state lock) and then a MySQL backup is occurring which puts the system under heavy CPU load, so occasionally the health endpoint (even with no health checks defined) is taking longer than 1s to respond, which causes K8s to terminate the pod due to how we have the probes set up (with a 1s timeout and only 1 attempt): https://github.com/juju/juju/blob/a90328ccf574d4be960b0236a5c21cd030111fd2/caas/kubernetes/provider/application/application.go#L59-L68

So if we can avoid needing the state lock in ServeHTTP in general, that would be best. If not, we should at least figure out a way to avoid it in GET /v1/health.

benhoyt commented 6 months ago

We're now avoiding hitting acquiring the state lock in the health check endpoint (#369), but not the other endpoints. We should consider this as part of the lock improvements work in 24.10. It's a bit annoying that we acquire the lock (twice no less) at the end of every request for two features Pebble doesn't actually use (maintenance and warnings), so we should at least consider if we can drop those.