Kuadrant / architecture

Architecture Documents
0 stars 10 forks source link

DAG for policy reconciliation #29

Open guicassolato opened 9 months ago

guicassolato commented 9 months ago

Problem statement

Kuadrant's current policy reconciliation process is too centered around the policy objects, not very (if anything at all) conscious of the topology underneath, other than by successively querying the cluster API.

This has been resulting in:

Example-driven explanation

                                   ┌───────────┐
                 ┌──EnvoyFilter-1  │ Limitador │    ┌──EnvoyFilter-2
     rlp-1────┐  │                 └───────────┘    │
              │  ├──WasmPlugin-1                    ├──WasmPlugin-2
              ▼  │                                  │
           ┌─────┴┐                           ┌─────┴┐
     ┌────►│ gw-1 │◄────┬────────────┐  ┌────►│ gw-2 │◄────┐
     │     └──────┘     │            │  │     └──────┘     │
     │                  │            │  │                  │
     │                  │            │  │                  │
┌────┴────┐       ┌─────┴───┐      ┌─┴──┴────┐       ┌─────┴───┐
│ route-1 │       │ route-2 │      │ route-3 │       │ route-4 │
└─────────┘       └─────────┘      └─────────┘       └─────────┘
     ▲                                  ▲                  ▲
     │                                  │                  │
     │                                  │                  │
   rlp-2                              rlp-3              rlp-4
  1. Reconciliation of rlp-2 (created after rlp-1) requires triggering the reconciliation of rlp-1 again, to recalculate the scope of rlp-1 – i.e. to update WasmPlugin-1 and Limitador, which in turn have just been updated because rlp-2 itself
  2. Similarly, rlp-3 requires recalculating WasmPlugin-1 and Limitador, apart from creating EnvoyFilter-2 and WasmPlugin-2
  3. Getting to the affected gateways involves: a. inspecting the specs of the targeted routes for parentRefs; b. listing all RLPs for gateway-targeting ones; c. trusting the state of the back-ref annotations.
  4. Reconciliation of any policy event involves trying to detect what kind of event triggered it – i.e. policy created/updated/deleted, route created/updated/deleted, gateway created/updated/deleted
  5. Other events need to be watched for reconciliation back from the source of truth (policies + network topology) – e.g. wasmplugin/envoyfilter/limitador modified/deleted

Possible solution

Reasons to do it

  1. Reduce (significantly) the number of requests to kube API, therefore also improve performance (speed) of reconciliation
  2. Move away from annotations as the way to track back-refs to the policies, by relying on the DAG to navigate the topology instead
  3. Simplify reconciliation loop regarding detection of the kind of resource event
  4. Improve clarity regarding the different kinds of events that trigger reconciliation (by having to define each kind of event and corresponding callback function) → improve coverage of scenarios (kinds of resource events)
  5. Possibility to react quicker and more efficiently, by sometimes not having to trigger “full” reconciliation but acting more directly according to each kind of event

Reason NOT to do it

  1. Involves rewriting the operators
  2. Possibly more resources (CPU, Mem) required by the policy controller

Challenges

  1. Bootstrapping the tree of pre-existing resources in-memory may take some non-negligible time – i.e. consider the impact for the readiness state of the controller
  2. Achieve enough level of abstraction so it works for all policy implementers (i.e. not only for Kuadrant)
  3. Avoid re-inventing the wheel – watch out for weird combination of the informers patterns and straightforward reconcilers
  4. Reeducate devs on the new pattern – no longer “textbook” controller-runtime
guicassolato commented 3 weeks ago

kuadrant/policy-machinery can be employed for this.