canonical / pebble

Pebble is a lightweight Linux service manager with layered configuration and an HTTP API.
https://canonical-pebble.readthedocs-hosted.com/
GNU General Public License v3.0
146 stars 54 forks source link

feat(planstate): create a dedicated plan manager #387

Closed flotter closed 6 months ago

flotter commented 6 months ago

The Pebble configuration YAML (called the plan) is currently managed by the service manager.

Traditionally, this design made sense as the plan primarily contained service-related configuration. However, the relationship between the plan content and the service manager has started to drift apart.

We are seeing more and more overlord managers making their way into Pebble directly, or through Pebble derivative projects.

As a consequence, the service manager (currently hosting the plan management) is getting really big and complex to test.

This PR addresses two concerns (some additional work remain):

  1. Break out the plan management API currently in the service manager into its own overlord manager.

  2. Consistently load the plan at Pebble startup, notifying all plan change subscribers (which now includes the service manager).

This change has a behaviour impact on Pebble: health checks will be started immediately on startup, rather than at some point later, somewhat arbitrarily, when something like a GET /v1/services or POST /v1/services or other action fetched the plan. The service manager implemented lazy loading of the configuration plan, which only resulted in the plan getting loaded from disk once a service manager command using plan is called. However, the behaviour is difficult to predict from a user perspective, see the two examples below:

Example 1:

// Plan

summary: Hello World
description: Hello world.
services:
  hello-world:
    override: replace
    startup: enabled
    command: echo "hello world"
checks:
  hello-check:
    override: merge
    exec:
      command: ping www.google.com

// Pebble server started: pebble run --hold

~/> ./pebble checks                     # Pebble client first checks request
Plan has no health checks.

~/> ./pebble services                   # Service Manager triggers plan load only now
Service      Startup  Current   Since
hello-world  enabled  inactive  -

~/> ./pebble checks                     # Pebble client second checks request
Check        Level  Status  Failures
hello-check  -      up      0/3

Example 2:

// Plan

summary: Hello World
description: Hello world.
services:
  hello-world:
    override: replace
    startup: enabled
    command: echo "hello world"
checks:
  hello-check:
    override: merge
    exec:
      command: ping www.google.com

// Pebble server started: pebble run --args hello-world -n "hello world" \; --hold

~/> ./pebble checks                     # Pebble client checks OK due to SetServiceArgs triggering load
Check        Level  Status  Failures
hello-check  -      up      0/3

Summary:

Following the change, the plan will always be loaded at Pebble startup with all the registered managers receiving a plan change notification with the loaded plan (which could be empty). This will consistently happen before the daemon is started, so the API experience will be consistent.

flotter commented 6 months ago

@rebornplusplus I am not sure how to add you, but could you kindly give this PR a careful look w.r.t the Rocks entrypoint perspective? I want to make sure your use of --hold and --args are not affected.

benhoyt commented 6 months ago

@flotter I haven't reviewed the code change yet (going to start on that today), but I just talked to the APAC Juju team, and they're happy with the change in behaviour this would mean: that health checks will start when Pebble starts, instead of an arbitrary interval afterwards depending on what the charm or whoever else was doing with Pebble. This was always their expectation (and I think charmer's expectation) in any case.

Separately from this, we'd like to consider a proper "startup" check level for K8s startup probes, so that charms like PostgreSQL can properly say "I'm up" to Pebble/Juju, rather than relying on a longish health check timeout / # of failures. But that's a separate discussion -- I've put it in our list to discuss in Madrid.