Kuadrant's current policy reconciliation process is too centered around the policy objects, not very (if anything at all) conscious of the topology underneath, other than by successively querying the cluster API.
This has been resulting in:
Occasional cyclic triggering of the reconciliation loop
Reconciliation of rlp-2 (created after rlp-1) requires triggering the reconciliation of rlp-1 again, to recalculate the scope of rlp-1 – i.e. to update WasmPlugin-1 and Limitador, which in turn have just been updated because rlp-2 itself
Similarly, rlp-3 requires recalculating WasmPlugin-1 and Limitador, apart from creating EnvoyFilter-2 and WasmPlugin-2
Getting to the affected gateways involves:
a. inspecting the specs of the targeted routes for parentRefs;
b. listing all RLPs for gateway-targeting ones;
c. trusting the state of the back-ref annotations.
Reconciliation of any policy event involves trying to detect what kind of event triggered it – i.e. policy created/updated/deleted, route created/updated/deleted, gateway created/updated/deleted
Other events need to be watched for reconciliation back from the source of truth (policies + network topology) – e.g. wasmplugin/envoyfilter/limitador modified/deleted
Possible solution
Keep a version of the topology in-memory as a DAG (Directed Acyclic Graph)
Rely more on the informers pattern, to replace/complement controller-runtime, possibly replacing the “traditional” reconciliation loops as we known them today
Recompute the effective policies top-down, from affected gateways and downwards to the leaves
Distinguish between events that affect the topology, events that just require recomputing and reapplying effective policies, and events that just require reapplying previously computed states.
Reasons to do it
Reduce (significantly) the number of requests to kube API, therefore also improve performance (speed) of reconciliation
Move away from annotations as the way to track back-refs to the policies, by relying on the DAG to navigate the topology instead
Simplify reconciliation loop regarding detection of the kind of resource event
Improve clarity regarding the different kinds of events that trigger reconciliation (by having to define each kind of event and corresponding callback function) → improve coverage of scenarios (kinds of resource events)
Possibility to react quicker and more efficiently, by sometimes not having to trigger “full” reconciliation but acting more directly according to each kind of event
Reason NOT to do it
Involves rewriting the operators
Possibly more resources (CPU, Mem) required by the policy controller
Challenges
Bootstrapping the tree of pre-existing resources in-memory may take some non-negligible time – i.e. consider the impact for the readiness state of the controller
Achieve enough level of abstraction so it works for all policy implementers (i.e. not only for Kuadrant)
Avoid re-inventing the wheel – watch out for weird combination of the informers patterns and straightforward reconcilers
Reeducate devs on the new pattern – no longer “textbook” controller-runtime
Problem statement
Kuadrant's current policy reconciliation process is too centered around the policy objects, not very (if anything at all) conscious of the topology underneath, other than by successively querying the cluster API.
This has been resulting in:
Example-driven explanation
rlp-2
(created afterrlp-1
) requires triggering the reconciliation ofrlp-1
again, to recalculate the scope ofrlp-1
– i.e. to updateWasmPlugin-1
andLimitador
, which in turn have just been updated becauserlp-2
itselfrlp-3
requires recalculatingWasmPlugin-1
andLimitador
, apart from creatingEnvoyFilter-2
andWasmPlugin-2
parentRefs
; b. listing all RLPs for gateway-targeting ones; c. trusting the state of the back-ref annotations.Possible solution
Reasons to do it
Reason NOT to do it
Challenges