Closed maleck13 closed 6 months ago
@guicassolato @rahulanand16nov
based on our conversation earlier, I think this is a solid use case for providing default at the gateway level. Perhaps during the next sprint we could investigate and come up with a proof of concept that would support this behaviour?
I think we want the default behaviour to start as not a merge. So a simple replace. IE I have defined a default but that can be replaced entirely for a given host WDYT?
Based on initial thoughts on how to go about solving not only defaults but also overrides, I think it makes sense for the kuadrant controller to create an internal index of the hierarchy of resources present on the cluster. This would be similar to the tree representation that authorino uses for its cache but customized for controller use cases.
We need this because of the relational nature of AuthPolicy
, and defaults/overrides require taking all the relations in one place to decide what behavior needs to be reconciled and configured within authorino.
let's go over a non-canonical example:
target: gateway
default:
hosts: ["*.pets.com"]
authscheme:
ac-gw-deny-all
override:
hosts: ["cdn.pets.com"]
authScheme:
ac-gw-allow-all
---
target: route
default:
hosts: ["dogs.pets.com"]
authScheme:
ac-rt-custom-default
override:
{NOT_ALLOWED_YET} // Not sure why override is required at the route-level
This would look like the following in the kuadrant controller.
.
│
▼
com
│
│
▼
┌────────────────────────pets─────────────────────────┐
│ │ │
│ │ │
▼ ▼ ▼
* cdn dogs
┌──────────────────┐ ┌───────────────────┐ ┌─────────────────────────┐
│Gateway: │ │Gateway: │ │Route: │
│ Default: │ │ Override: │ │ Default: │
│ ac-gw-deny-all│ │ ac-gw-allow-all│ │ ac-route-cust-default│
└──────────────────┘ └───────────────────┘ └─────────────────────────┘
Each KAP would add information to the index tree and after each change to the tree, reconciliation would happen to a single resource from each node based on predefined rules.
Q: What happens if there are two route-level KAPs with the same host? A: In case of conflict, the timestamp of creation will be used and the status of rejected resource will be updated. During the rebuilding of the index, the same information will be used.
Q: What are the rules of creation for AuthConfigs? A: The following algorithm is still in progress as we come across more cases.
If children nodes consist of a `star node` (`*`):
reconcile(star node); // rest should be ignored at that level;
delete(other nodes);
else: // there is no overlap
reconcile(each node);
reconcile(node):
if the node contains gateway level override then reconcile only that;
else if it contains route and gateway level defaults; reconcile only route-level defaults;
else reconcile default which is present;
Q: What would happen to the example shown above? A: All of them would be created.
Q: What is left? A: Still merge part of the equation is not defined and of course refine the algorithm to include more cases where expected behavior is currently wrong.
This is not perfect but I hope gives an idea of what I am thinking :)
cc @maleck13 @alexsnaps @guicassolato @eguzki
Just on
override: {NOT_ALLOWED_YET} // Not sure why override is required at the route-level
We could certainly add a webhook that would reject an override at the route level with a useful message such as: overrides when targeting HTTPRoutes are not allowed, please set a default
It may not be required. One thought that comes to mind is if we were to allow setting a policy at the service level for example rather than just the route, but I don't see a good usecase for that right now.
I think it makes sense to build up a representation of the relationships between authpolicies. This will avoid an expensive list all auth policies and end up building a temporary structure for each change. I think the actual rules need some refinment
@rahulanand16nov, a couple extra observations from my side as well.
In the diagram, I believe you meant for the root to be a dot ('.'), not a star ('*').
Q: What happens if there are two route-level KAPs with the same host? A: In case of conflict, the timestamp of creation will be used and the status of rejected resource will be updated. During the rebuilding of the index, the same information will be used.
Which one prevails, the newest or the oldest resource?
Do default
or override
have any implication on the above?
Can wildcards be used in route-level policies?
Q: What are the rules of creation for AuthConfigs? A: The following algorithm is still in progress as we come across more cases.
If children nodes consist of a `start node` (`*`): [...]
Do you mean star node
?
Q: What would happen to the example shown above? A: All of them would be created.
Do you mean as 3 separate AuthConfigs? (I imagine so; just double checking.)
Moreover, we're considering disabling Authorino's host name collision prevention for Kuadrant (https://github.com/Kuadrant/authorino/issues/329). If that is the case, then all fine. Otherwise, in the case of 3 separate AuthConfigs for the example above, let's have in mind that *.pets.com
would have to be the last AuthConfig applied, but, again, only if host name collision prevention is off and the AuthConfigs are from different namespaces – likely this will never be the case for Kuadrant.
It also would be interesting to see an example of the AuthConfig operations induced by each step into building the example above (IWO, what 'reconcile' means exactly in the algorithm), and them permutations of the steps (i.e. same steps in different order).
In the diagram, I believe you meant for the root to be a dot ('.'), not a star ('*').
Yep, it's supposed to be dot('.');
Q: What happens if there are two route-level KAPs with the same host? A: In case of conflict, the timestamp of creation will be used and the status of rejected resource will be updated. During the rebuilding of the index, the same information will be used.
Which one prevails, the newest or the oldest resource? Do
default
oroverride
have any implication on the above? Can wildcards be used in route-level policies?
the oldest resource would prevail; route-level policies can have wildcards; default or override will have no implications; Note: These are the ideal expectations but as we move forward and find more problems, might need to pivot on those expectations.
Q: What are the rules of creation for AuthConfigs? A: The following algorithm is still in progress as we come across more cases.
If children nodes consist of a `start node` (`*`): [...]
Do you mean
star node
?
Yep
Q: What would happen to the example shown above? A: All of them would be created.
Do you mean as 3 separate AuthConfigs? (I imagine so; just double checking.)
Ideally, yes.
Moreover, we're considering disabling Authorino's host name collision prevention for Kuadrant (Kuadrant/authorino#329). If that is the case, then all fine. Otherwise, in the case of 3 separate AuthConfigs for the example above, let's have in mind that
*.pets.com
would have to be the last AuthConfig applied, but, again, only if host name collision prevention is off and the AuthConfigs are from different namespaces – likely this will never be the case for Kuadrant.
We can control the order of creation for each authconfig depending on how we traverse the tree so I hope it shouldn't be an issue.
It also would be interesting to see an example of the AuthConfig operations induced by each step into building the example above (IWO, what 'reconcile' means exactly in the algorithm), and them permutations of the steps (i.e. same steps in different order).
:+1:
I just realized the algorithm doesn't even work for the example I wrote above :facepalm: but second iteration does.
if children nodes have a star node (`*`):
if it has a `gateway-override`:
just reconcile `gateway-override`;
delete other nodes;
elif it has `route-default`:
reconcile `route-default`;
reconcile(other nodes and delete `gateway-default` in them);
else
reconcile `gateway-default`;
reconcile(other nodes);
else
for each node: reconcile (node);
reconcile(node):
reconcile in order of `gateway-override` > `route-default` > `gateway-default`;
Even Craig mentioned we need to isolate this logic and test the hell out of it.
Edit: I noticed that we should also move spec.rules
inside default and override because if an authscheme
is overridden by another authscheme
then when
rules are kind of redundant.
Probably make sense to start talking about merge case now.
cc @maleck13 @guicassolato
@rahulanand16nov, here's my take on the algorithm to reconcile an AuthPolicy for a particular host.
I took into account 8 types of AuthPolicies (i.e. 8 different cases), based on the 3 aspects (variables) of the AuthPolicy.
Variables:
Gateway
| HTTPRoute
default
| override
Cases (2 targets ⅹ 2 hosts x 2 schemes = 8):
Gateway
\default
Gateway
\override
Gateway
\default
Gateway
\override
HTTPRoute
\default
HTTPRoute
\override
HTTPRoute
\default
HTTPRoute
\override
Note: I find useful to have RLD and RLO both supported, e.g. for the use case where admins can define RLO policies, while devs are restricted to RLD.
Precedence: GWD < RWD < GLD < RLD < GWO < RWO < GLO < RLO
Description of the algorithm:
reconcile(kap, host)
checks if another authpolicy exists, of any type that takes precedence over the resource being reconciled, targeting the same host (wildcard overlaps included); if thats’s the case, it rejects the resource for this particular host. Otherwise, it updates the status of any existing resource that would be superseded, and deletes/creates/updates the authconfigs accordingly
Pseudocode:
PolicyType = enum{GWD, RWD, GLD, RLD, GWO, RWO, GLO, RLO}
func reconcile(kap AuthPolicy, host string) error
var supersededTypes []PolicyType
switch kap.PolicyType()
case GWD:
if existOthers(kap, host, []PolicyType{RWD, GWO, RWO})
return updateAuthPolicyStatus("%host superseded", kap)
supersededTypes = []PolicyType{GWD}
case RWD:
if existOthers(kap, host, []PolicyType{GWO, RWO})
return updateAuthPolicyStatus("%host superseded", kap)
supersededTypes = []PolicyType{GWD, RWD}
case GWO:
if existOthers(kap, host, []PolicyType{RWO})
return updateAuthPolicyStatus("%host superseded", kap)
supersededTypes = []PolicyType{GLD, RLD, GWO}
case RWO:
supersededTypes = []PolicyType{GWD, RWD, GLD, RLD, GWO, RWO}
case GLD:
if existOthers(kap, host, []PolicyType{RLD, GWO, RWO, GLO, RLO})
return updateAuthPolicyStatus("%host superseded", kap)
supersededTypes = []PolicyType{GLD}
case RLD:
if existOthers(kap, host, []PolicyType{GWO, RWO, GLO, RLO})
return updateAuthPolicyStatus("%host superseded", kap)
supersededTypes = []PolicyType{GLD, RLD}
case GLO:
if existOthers(kap, host, []PolicyType{RLO})
return updateAuthPolicyStatus("%host superseded", kap)
supersededTypes = []PolicyType{GLD, RLD, GLO}
case RLO:
supersededTypes = []PolicyType{GLD, RLD, GLO, RLO}
# somehow as an atomic transaction
supersededAuthPolicies := findOthers(kap, host, supersededTypes)
updateAuthPolicyStatus("%host superseded", supersededAuthPolicies...)
deleteAuthConfigs(hosts(supersededAuthPolicies, host) - host)
createOrUpdateAuthConfig(host, kap.AuthScheme)
func findOthers(kap AuthPolicy, host string, policyTypes []PolicyType) []AuthPolicy
return []AuthPolicy{ other authpolicies targeting the same host (wildcard overlaps included) and whose type is in 'policyTypes' }
func existOthers(kap AuthPolicy, host string, policyTypes []PolicyType) bool
return len(findOthers(kap, host, policyTypes)) > 0
Updating my own suggestion here, so it's consistent with the Gateway API (GW-API) spec.
I said before:
Precedence: GWD < RWD < GLD < RLD < GWO < RWO < GLO < RLO
GW-API establishes that the hierarchy (which I called "precedence") for the different types of resources should be exactly the reverse between defaults and overrides.
Given our 4 types of resources:
Gateway
\Gateway
\HTTPRoute
\HTTPRoute
\The hierarchy in our case should actually be:
GW[D] < RW[D] < GL[D] < RL[D] < RL[O] < GL[O] < RW[O] < GW[O]
(Now noting default (D) and override (B) within square brackets.)
Rewriting the above with inverted inequality symbol (to match how it's represented in the GW-API docs):
GW[O] > RW[O] > GL[O] > RL[O] > RL[D] > GL[D] > RW[D] > GW[D]
Finally, the updated pseudocode for the hierarchy above:
Warning! I haven't tested this.
PolicyType = enum{GWO, RWO, GLO, RLO, RLD, GLD, RWD, GWD}
// reconcile(kap, host) reconciles a policy for a particular host.
// It looks for other policies targeting the same host (wildcard overlaps included), that are higher in the hierarchy and
// that according to the rules would be favoured to protect the host. If that is the case, it rejects the resource; otherwise,
// it updates the status of any existing policy that would be superseded by the resource regarding the the host, and
// deletes/creates/updates the authconfigs accordingly.
//
// Hierarchy:
// GWO > RWO > GLO > RLO > RLD > GLD > RWD > GWD
//
// General rules:
// 1. **Older policy rule:** Between policies of the same type, targeting the same host, the older policy (created first) wins, i.e. older (existing) supersedes newer (challenging).
// 2. **Strict subset rule:** Literal hosts (L) never supersede wildcards (W), because L ⊂ W; they can leave side by side (different AuthConfigs).
// 3. **Default rule:** Defaults (D) only supersede other defaults (lower in the hierarchy).
// 4. **Override rule:** Overrides (O) supersede everything that is lower in the hierarchy, as long as it doesn't violate any of the other rules.
//
// Combined rules:
// - xWO rule:
// - supersedes all types that are lower in the hierarchy (rule 4)
// - is superseded by older self and all xWO that are higher in the hierarchy (rules 1, 2, 3 and 4)
// - xLO rule:
// - supersedes all xLD and xLO that are lower in the hierarchy (rules 2 and 4)
// - is superseded by older self and all xWO and xLO that are higher in the hierarchy (rules 1, 3 and 4)
// - xLD rule:
// - supersedes all xLD that are lower in the hierarchy (rules 2 and 3)
// - is superseded by older self and all types that are higher in the hierarchy (rules 1, 3 and 4)
// - xWD rule:
// - supersedes all xWD that are lower in the hierarchy (rule 3)
// - is superseded by older self and all xWD and xWO that are higher in the hierarchy (rules 1, 2, 3 and 4)
func reconcile(kap AuthPolicy, host string) error {
var supersededTypes []PolicyType
switch kap.PolicyType() {
case GWO:
if existOthers(kap, host, []PolicyType{GWO}) {
return updateAuthPolicyStatus("cannot apply policy for %host", kap)
}
supersededTypes = []PolicyType{RWO, GLO, RLO, RLD, GLD, RWD, GWD}
case RWO:
if existOthers(kap, host, []PolicyType{GWO, RWO}) {
return updateAuthPolicyStatus("cannot apply policy for %host", kap)
}
supersededTypes = []PolicyType{GLO, RLO, RLD, GLD, RWD, GWD}
case GLO:
if existOthers(kap, host, []PolicyType{GWO, RWO, GLO}) {
return updateAuthPolicyStatus("cannot apply policy for %host", kap)
}
supersededTypes = []PolicyType{RLO, RLD, GLD}
case RLO:
if existOthers(kap, host, []PolicyType{GWO, RWO, GLO, RLO}) {
return updateAuthPolicyStatus("cannot apply policy for %host", kap)
}
supersededTypes = []PolicyType{RLD, GLD}
case RLD:
if existOthers(kap, host, []PolicyType{GWO, RWO, GLO, RLO, RLD}) {
return updateAuthPolicyStatus("cannot apply policy for %host", kap)
}
supersededTypes = []PolicyType{GLD}
case GLD:
if existOthers(kap, host, []PolicyType{GWO, RWO, GLO, RLO, RLD, GLD}) {
return updateAuthPolicyStatus("cannot apply policy for %host", kap)
}
case RWD:
if existOthers(kap, host, []PolicyType{GWO, RWO, RWD}) {
return updateAuthPolicyStatus("cannot apply policy for %host", kap)
}
supersededTypes = []PolicyType{GWD}
case GWD:
if existOthers(kap, host, []PolicyType{GWO, RWO, RWD, GWD}) {
return updateAuthPolicyStatus("cannot apply policy for %host", kap)
}
}
// somehow as an atomic transaction
{
supersededAuthPolicies := findOthers(kap, host, supersededTypes)
updateAuthPolicyStatus("can no longer apply policy for %host - now taken by %kap", supersededAuthPolicies...)
deleteAuthConfigs(hosts(supersededAuthPolicies, host) - host)
createOrUpdateAuthConfig(host, kap.AuthScheme)
}
}
func findOthers(kap AuthPolicy, host string, policyTypes []PolicyType) []AuthPolicy {
return []AuthPolicy{ other authpolicies targeting the same host (wildcard overlaps included) and whose type is in 'policyTypes' }
}
func existOthers(kap AuthPolicy, host string, policyTypes []PolicyType) bool {
return len(findOthers(kap, host, policyTypes)) > 0
}
Ok, here goes my 2c regarding this topic…
It’s an effort that aims to make the algorithm as simple as possible, redefining the precedence (hierarchy) order and the actual “compilation” of the policy rules. It aspires to also unify and be applied to the Rate Limit Policies
It’s based on the GW API and Gui’s take, with some changes:
GL[O] > GW[O] > RL[O] > RW[O] > RL[D] > GL[D] > RW[D] > GW[D]
My understanding is that if you have a GW Literal Override, means that the level of specificity that the Cluster Admin had in mind needs to overtake any wildcard that most likely are general rules, therefore would always have more weight than any less specific one. Which would be the opposite when it comes to Defaults.
I know it’s a bit simplistic, and might be even naive… but hopefully this will help to craft a good algorithm that could be applied for both types of policies (RL/Auth) The most difficult, evidently, is the merge approach… but there are good libraries in Golang, such as Mergo, that could help us out.
My contribution:
hostnames
for a given authpolicy are read from spec.authScheme.hosts
. This spec.authScheme.hosts
is optional, if not set, the hostnames are read from the network resource the KAP is targeting. Policy rules should not play any role defining the hostnames
. *.com
and the other petstore.com
, they both apply. Authorino implements longest path match, so both policies would be enforced depending on the workload. They can live side by side in two different authconfigs. Thus, the hierarchy would be straightforward from the Gateway API def: GO > RO > RD > GDHigher precedence | Lower precedence | Applied policies |
---|---|---|
*.example.com |
*.com |
*.example.com , *.com |
*.com |
*.example.com |
*.example.com (modified), *.com |
*.com
has higher preference than *.example.com
, the policy *.example.com
would still apply, however previously it would had been modified merging with higher level policies. If there is a conflict, the one with higher preference will prevail. A request to a.example.com
would hit the modified version of the policy and a request to other.com
would hit the unmodified version of the *.com
policy. Surely I am missing tons of things, but food for thought is always welcome (I guess).
@eguzki, as just discussed offline.
Policy rules should not play any role defining the hostnames.
I believe the policy rules should play a role to the host names in the AuthConfig, otherwise we enable a situation where the AuthConfig covers more that what it suppose to, e.g.:
kind: HTTPRoute:
spec:
hostnames: [`*.petstore.com`]
---
kind: AuthPolicy
spec:
rules:
- hosts: ["api.petstore.com"]
We don't want *.petstore.com
in the AuthConfig for the above, just like IMHO we don't want it in the Istio AuthorizationPolicy.
This
spec.authScheme.hosts
is optional
Maybe authpolicy.spec.authScheme.hosts
should not even be an option. authconfig.spec.hosts
makes sense and needs to remain required, but the abstraction on top (i.e. the KAP) can read the hosts from the rules and from the network object, as described before.
I can't see any use case where the user would want the ext-authz service triggered but have corresponding AuthConfig (effectively resulting in a 404 Not Found
in this case). Perhaps to "lock" the backend service via authorization layer? E.g.
kind: HTTPRoute:
spec:
hostnames: [`*.company.com`]
---
kind: AuthPolicy
spec:
rules:
- hosts: ["*.petstore.company.com"]
authScheme:
hosts: ["api.petstore.company.com"]
Meaning:
*.petstore.company.com
trigger the authz serviceapi.petstore.company.com
passes depending on the evaluation of the AuthConfig*.petstore.company.com
are denied by the authz service (with a 404)I would not define any order of preference between "literal" and "wilcard" per se.
That was the same reasoning that previously made me think on RW[O] > GL[O]
, i.e. a route winning over a gateway in the hierarchy. However, I too was thinking that, in Authorino, both can co-exist, because one is specific (literal), while the other is generic (wildcard).
I will try later to rethink this so it reflects the correct hierarchy between the different types of resources as specified in GW-API and pointed out by @didierofrivia (i.e. Gateway
override wins over HTTPRoute
override, and the other way around for defaults), and Authorino's specificity that both policies can co-exist and most specific wins.
I think I'll come somewhat out of left field on this but: Given the Gateway API's mandate that:
policy resources may omit
Override
orDefault
fields, but at least one of them MUST be present.
I'd like to propose a more "aggressive" approach and go for one or the other and build from there… If we really want to use the policy attachment concept, we don't really have a choice today anyways… Given the spec's:
GW[O] > RW[O] > GL[O] > RL[O] > RL[D] > GL[D] > RW[D] > GW[D]
I'd be tempted to start out with Default
, so that RL
wins. And iterate from there… Today that Default
at the RL could completely replace whatever a GW
policy would declare. Which raises interesting in itself:
I don't think that Default
s only will suffice to solve all of the use cases we're coming up with, but it'd be interesting to see how far we can get with "only" Default
s and what problem we face along the way…
So, not to invalidate any of what's been said so far, nor to pun on Override
s, but rather take an applied hands-on approach to one of them and "figure things out as we go"… wdyt?
Superseded by https://github.com/Kuadrant/kuadrant-operator/issues/431
Use Case
As a gateway administrator I want to lock down access to any host attaching to this gateway. By default I want to apply a default DENY ALL policy at the gateway level. When a policy is created targeting a HTTPRoute, I want that policy to then take precedence over my default policy.