Closed flotter closed 6 months ago
@rebornplusplus I am not sure how to add you, but could you kindly give this PR a careful look w.r.t the Rocks entrypoint perspective? I want to make sure your use of --hold
and --args
are not affected.
@flotter I haven't reviewed the code change yet (going to start on that today), but I just talked to the APAC Juju team, and they're happy with the change in behaviour this would mean: that health checks will start when Pebble starts, instead of an arbitrary interval afterwards depending on what the charm or whoever else was doing with Pebble. This was always their expectation (and I think charmer's expectation) in any case.
Separately from this, we'd like to consider a proper "startup" check level for K8s startup probes, so that charms like PostgreSQL can properly say "I'm up" to Pebble/Juju, rather than relying on a longish health check timeout / # of failures. But that's a separate discussion -- I've put it in our list to discuss in Madrid.
The Pebble configuration YAML (called the plan) is currently managed by the service manager.
Traditionally, this design made sense as the plan primarily contained service-related configuration. However, the relationship between the plan content and the service manager has started to drift apart.
We are seeing more and more overlord managers making their way into Pebble directly, or through Pebble derivative projects.
As a consequence, the service manager (currently hosting the plan management) is getting really big and complex to test.
This PR addresses two concerns (some additional work remain):
Break out the plan management API currently in the service manager into its own overlord manager.
Consistently load the plan at Pebble startup, notifying all plan change subscribers (which now includes the service manager).
This change has a behaviour impact on Pebble: health checks will be started immediately on startup, rather than at some point later, somewhat arbitrarily, when something like a
GET /v1/services
orPOST /v1/services
or other action fetched the plan. The service manager implemented lazy loading of the configuration plan, which only resulted in the plan getting loaded from disk once a service manager command using plan is called. However, the behaviour is difficult to predict from a user perspective, see the two examples below:Example 1:
// Plan
// Pebble server started: pebble run --hold
Example 2:
// Plan
// Pebble server started: pebble run --args hello-world -n "hello world" \; --hold
Summary:
Following the change, the plan will always be loaded at Pebble startup with all the registered managers receiving a plan change notification with the loaded plan (which could be empty). This will consistently happen before the daemon is started, so the API experience will be consistent.