canonical / pebble

Pebble is a lightweight Linux service manager with layered configuration and an HTTP API.
https://canonical-pebble.readthedocs-hosted.com/
GNU General Public License v3.0
145 stars 54 forks source link

feat(plan): use lanes to start services with dependencies #437

Closed IronCore864 closed 2 months ago

IronCore864 commented 3 months ago

When starting services, put services that depend on each other (before/after/requires) in the same lane. If there is no dependency, put it in a stand-alone lane. Tasks only wait for tasks defined in the same lane.

Fixes https://github.com/canonical/pebble/issues/231.

Logic

The logic is as follows:

Tests

I mainly ran three types of test. The first is the one mentioned in https://github.com/canonical/pebble/issues/231:

services:
  a:
    override: replace
    command: sh -c "sleep 999"
    startup: enabled
  b:
    override: replace
    command: sh -c "exit 123"
    startup: enabled
  c:
    override: replace
    command: sh -c "sleep 999"
    startup: enabled

With this feature, service C won't be in "hold" status, but rather, started. Logs:

2024-07-23T17:24:56.172Z [pebble] Started daemon.
2024-07-23T17:24:56.182Z [pebble] POST /v1/services 2.781792ms 202
2024-07-23T17:24:56.185Z [pebble] Service "a" starting: sh -c "sleep 999"
2024-07-23T17:24:56.185Z [pebble] Service "b" starting: sh -c "exit 123"
2024-07-23T17:24:56.186Z [pebble] Service "c" starting: sh -c "sleep 999"
2024-07-23T17:24:56.189Z [pebble] Change 1 task (Start service "b") failed: cannot start service: exited quickly with code 123
2024-07-23T17:24:57.190Z [pebble] GET /v1/changes/1/wait 1.005492792s 200
2024-07-23T17:24:57.190Z [pebble] Started default services with change 1.
2024-07-23T17:25:01.404Z [pebble] GET /v1/services?names= 120.625µs 200
^C2024-07-23T17:25:21.617Z [pebble] Exiting on interrupt signal.
2024-07-23T17:25:21.621Z [pebble] Stopping all running services.
2024-07-23T17:25:21.628Z [pebble] Service "c" stopped
2024-07-23T17:25:21.628Z [pebble] Service "a" stopped

As a second test, if C depends on B, C is still in "hold" because they are in the same lane:

services:
  a:
    override: replace
    command: sh -c "sleep 999"
    startup: enabled
  b:
    override: replace
    command: sh -c "exit 123"
    startup: enabled
  c:
    override: replace
    command: sh -c "sleep 999"
    startup: enabled
    requires:
      - b

Logs:

2024-07-23T17:25:37.799Z [pebble] Started daemon.
2024-07-23T17:25:37.810Z [pebble] POST /v1/services 2.859792ms 202
2024-07-23T17:25:37.813Z [pebble] Service "a" starting: sh -c "sleep 999"
2024-07-23T17:25:37.813Z [pebble] Service "b" starting: sh -c "exit 123"
2024-07-23T17:25:37.816Z [pebble] Change 1 task (Start service "b") failed: cannot start service: exited quickly with code 123
2024-07-23T17:25:38.819Z [pebble] GET /v1/changes/1/wait 1.005558292s 200
2024-07-23T17:25:38.820Z [pebble] Started default services with change 1.
2024-07-23T17:25:47.218Z [pebble] GET /v1/services?names= 103.208µs 200
^C2024-07-23T17:26:19.741Z [pebble] Exiting on interrupt signal.
2024-07-23T17:26:19.745Z [pebble] Stopping all running services.
2024-07-23T17:26:19.752Z [pebble] Service "a" stopped

As a third test, I tested the "requires + before" combination:

services:
  a:
    override: replace
    command: sh -c "exit 123"
    startup: enabled
    requires:
      - b
    before:
      - b
  b:
    override: replace
    command: sh -c "sleep 999"
    startup: enabled

Logs:

2024-07-23T17:26:23.145Z [pebble] Started daemon.
2024-07-23T17:26:23.154Z [pebble] POST /v1/services 2.873792ms 202
2024-07-23T17:26:23.157Z [pebble] Service "a" starting: sh -c "exit 123"
2024-07-23T17:26:23.161Z [pebble] Change 1 task (Start service "a") failed: cannot start service: exited quickly with code 123
2024-07-23T17:26:23.167Z [pebble] GET /v1/changes/1/wait 12.322667ms 200
2024-07-23T17:26:23.167Z [pebble] Started default services with change 1.
2024-07-23T17:26:26.642Z [pebble] GET /v1/services?names= 583.833µs 200
^C2024-07-23T17:26:52.384Z [pebble] Exiting on interrupt signal.
IronCore864 commented 3 months ago

Not in the scope of this PR (or the issue https://github.com/canonical/pebble/issues/231 we are trying to fix), but I'd like to point out the painful experience of using requires/before/after when implementing and testing this feature.

This PR is my second one on this feature because, in the first draft, I made the mistake of assuming that "when handling task B, if B depends on task A, A should have already been processed, since the input is sorted already, so, simply put B in the lane where A already is." I ignored the possibility that you can configure A to require B but A before B, for example, the test case 3 mentioned above:

services:
  a:
    override: replace
    command: sh -c "sleep 999"
    startup: enabled
    requires:
      - b
    before:
      - b
  b:
    override: replace
    command: sh -c "sleep 999"
    startup: enabled

The logic in my first draft won't work, because the sorted order is (A, B), so when handling A, B's lane isn't known yet; and when handling B, since B doesn't require anything, you don't know which lane to put it in. To solve this problem, I created the second solution in this PR, where if A's dependency isn't known yet, create a lane, and put all dependencies in the same lane even before the task is created.

However, after giving this test case more thought, I concluded that the test case makes no sense, because "A requires B, but A must be started before B" is illogical:

So, saying that "A requires B" and "A must start before B" is illogical, the two parts counter each other.

I tried but failed to think of any use case where "A requires B", but "A must be started before B". Some analogies:

Using three keywords requires/before/after to handle dependency is an overkill, which brings more problems than solves.

I think we can drop "before/after", and rename "requires" to something like "depend-on" (depend-on means requires + after, in my book).

I prefer renaming "requires" to "depends-on" because "requires" in many cases isn't right:

If we drop "before/after" and leave only "depends-on", the layer in test case 3 can be fixed to:

services:
    a:
        override: replace
        command: /bin/sh -c "echo test1; sleep 10"
        startup: enabled

    b:
        override: replace
        command: /bin/sh -c "echo test2; sleep 10"
        depends-on:
          - a

And this is more than enough to solve any dependency requirements. Proof: Terraform does the same thing, even the name "depends-on" is borrowed from Terraform.

If we can do this, after sorting, the order is (A, B) and when handling B, A's lane is already decided. In this case, the feature in this PR can be much simpler:

# pseudo
for task in tasks:
  if task.depends_on:
    task.lane = task.depends_on.lane
    task.wait_for(task.depends_on)
benhoyt commented 3 months ago

Using three keywords requires/before/after to handle dependency is an overkill, which brings more problems than solves.

I think you're right -- it's overkill. And I and others have been confused by requires/before/after several times. There's more in the README here.

"Requires" basically means depends-on, but doesn't specify any order of starting. "Before" and "after" deal with the ordering.

I think we can drop "before/after", and rename "requires" to something like "depend-on" (depend-on means requires + after, in my book).

That said, I think that ship has sailed -- in other words, it's too late to change it now. We can probably refine before/after/requires, but not remove or change them significantly.

Let's chat about all this further when you're back, and then you can do a second iteration.

IronCore864 commented 2 months ago

Using three keywords requires/before/after to handle dependency is an overkill, which brings more problems than solves.

I think you're right -- it's overkill. And I and others have been confused by requires/before/after several times. There's more in the README here.

"Requires" basically means depends-on, but doesn't specify any order of starting. "Before" and "after" deal with the ordering.

I think we can drop "before/after", and rename "requires" to something like "depend-on" (depend-on means requires + after, in my book).

That said, I think that ship has sailed -- in other words, it's too late to change it now. We can probably refine before/after/requires, but not remove or change them significantly.

Let's chat about all this further when you're back, and then you can do a second iteration.

Refactored according to the above suggestion. Key changes:

Some explanation: