Investigate supporting defaults and overrides in AuthPolicy

Kuadrant / kuadrant-operator

The Operator to install and manage the lifecycle of the Kuadrant components deployments.

Apache License 2.0

37 stars 33 forks source link

Investigate supporting defaults and overrides in AuthPolicy #91

Closed maleck13 closed 6 months ago

maleck13 commented 2 years ago

Use Case

As a gateway administrator I want to lock down access to any host attaching to this gateway. By default I want to apply a default DENY ALL policy at the gateway level. When a policy is created targeting a HTTPRoute, I want that policy to then take precedence over my default policy.

[ ] Validate the existing behaviour supports this use case
[ ] Move authschema to default
[ ] Create a Demo showing two services and default policy. Show how access is only allowed when a policy has been defined

maleck13 commented 2 years ago

@guicassolato @rahulanand16nov

based on our conversation earlier, I think this is a solid use case for providing default at the gateway level. Perhaps during the next sprint we could investigate and come up with a proof of concept that would support this behaviour?

I think we want the default behaviour to start as not a merge. So a simple replace. IE I have defined a default but that can be replaced entirely for a given host WDYT?

rahulanand16nov commented 2 years ago

Based on initial thoughts on how to go about solving not only defaults but also overrides, I think it makes sense for the kuadrant controller to create an internal index of the hierarchy of resources present on the cluster. This would be similar to the tree representation that authorino uses for its cache but customized for controller use cases.

We need this because of the relational nature of AuthPolicy, and defaults/overrides require taking all the relations in one place to decide what behavior needs to be reconciled and configured within authorino.

let's go over a non-canonical example:

target: gateway
default:
  hosts: ["*.pets.com"]
  authscheme:
    ac-gw-deny-all
override:
  hosts: ["cdn.pets.com"]
  authScheme:
    ac-gw-allow-all
---
target: route
default:
  hosts: ["dogs.pets.com"]
  authScheme:
    ac-rt-custom-default
override:
  {NOT_ALLOWED_YET} // Not sure why override is required at the route-level

This would look like the following in the kuadrant controller.

                                   .
                                   │
                                   ▼
                                  com
                                   │
                                   │
                                   ▼
         ┌────────────────────────pets─────────────────────────┐
         │                         │                           │
         │                         │                           │
         ▼                         ▼                           ▼
         *                        cdn                         dogs
┌──────────────────┐     ┌───────────────────┐     ┌─────────────────────────┐
│Gateway:          │     │Gateway:           │     │Route:                   │
│  Default:        │     │  Override:        │     │  Default:               │
│    ac-gw-deny-all│     │    ac-gw-allow-all│     │    ac-route-cust-default│
└──────────────────┘     └───────────────────┘     └─────────────────────────┘

Each KAP would add information to the index tree and after each change to the tree, reconciliation would happen to a single resource from each node based on predefined rules.

Q: What happens if there are two route-level KAPs with the same host? A: In case of conflict, the timestamp of creation will be used and the status of rejected resource will be updated. During the rebuilding of the index, the same information will be used.

Q: What are the rules of creation for AuthConfigs? A: The following algorithm is still in progress as we come across more cases.

If children nodes consist of a `star node` (`*`):
  reconcile(star node); // rest should be ignored at that level;
  delete(other nodes);
else: // there is no overlap
  reconcile(each node);

reconcile(node):
  if the node contains gateway level override then reconcile only that;
  else if it contains route and gateway level defaults; reconcile only route-level defaults;
  else reconcile default which is present;

Q: What would happen to the example shown above? A: All of them would be created.

Q: What is left? A: Still merge part of the equation is not defined and of course refine the algorithm to include more cases where expected behavior is currently wrong.

This is not perfect but I hope gives an idea of what I am thinking :)

cc @maleck13 @alexsnaps @guicassolato @eguzki

maleck13 commented 2 years ago

Just on

override: {NOT_ALLOWED_YET} // Not sure why override is required at the route-level

We could certainly add a webhook that would reject an override at the route level with a useful message such as: overrides when targeting HTTPRoutes are not allowed, please set a default

It may not be required. One thought that comes to mind is if we were to allow setting a policy at the service level for example rather than just the route, but I don't see a good usecase for that right now.

maleck13 commented 2 years ago

I think it makes sense to build up a representation of the relationships between authpolicies. This will avoid an expensive list all auth policies and end up building a temporary structure for each change. I think the actual rules need some refinment

guicassolato commented 2 years ago

@rahulanand16nov, a couple extra observations from my side as well.

In the diagram, I believe you meant for the root to be a dot ('.'), not a star ('*').

Q: What happens if there are two route-level KAPs with the same host? A: In case of conflict, the timestamp of creation will be used and the status of rejected resource will be updated. During the rebuilding of the index, the same information will be used.

Which one prevails, the newest or the oldest resource? Do default or override have any implication on the above? Can wildcards be used in route-level policies?

Q: What are the rules of creation for AuthConfigs? A: The following algorithm is still in progress as we come across more cases.
If children nodes consist of a `start node` (`*`):
[...]

Do you mean star node?

Q: What would happen to the example shown above? A: All of them would be created.

Do you mean as 3 separate AuthConfigs? (I imagine so; just double checking.)

Moreover, we're considering disabling Authorino's host name collision prevention for Kuadrant (https://github.com/Kuadrant/authorino/issues/329). If that is the case, then all fine. Otherwise, in the case of 3 separate AuthConfigs for the example above, let's have in mind that *.pets.com would have to be the last AuthConfig applied, but, again, only if host name collision prevention is off and the AuthConfigs are from different namespaces – likely this will never be the case for Kuadrant.

It also would be interesting to see an example of the AuthConfig operations induced by each step into building the example above (IWO, what 'reconcile' means exactly in the algorithm), and them permutations of the steps (i.e. same steps in different order).

rahulanand16nov commented 2 years ago

In the diagram, I believe you meant for the root to be a dot ('.'), not a star ('*').

Yep, it's supposed to be dot('.');

Q: What happens if there are two route-level KAPs with the same host? A: In case of conflict, the timestamp of creation will be used and the status of rejected resource will be updated. During the rebuilding of the index, the same information will be used.

Which one prevails, the newest or the oldest resource? Do default or override have any implication on the above? Can wildcards be used in route-level policies?

the oldest resource would prevail; route-level policies can have wildcards; default or override will have no implications; Note: These are the ideal expectations but as we move forward and find more problems, might need to pivot on those expectations.

Q: What are the rules of creation for AuthConfigs? A: The following algorithm is still in progress as we come across more cases.
If children nodes consist of a `start node` (`*`):
[...]
Do you mean star node?

Yep

Q: What would happen to the example shown above? A: All of them would be created.

Do you mean as 3 separate AuthConfigs? (I imagine so; just double checking.)

Ideally, yes.

Moreover, we're considering disabling Authorino's host name collision prevention for Kuadrant (Kuadrant/authorino#329). If that is the case, then all fine. Otherwise, in the case of 3 separate AuthConfigs for the example above, let's have in mind that *.pets.com would have to be the last AuthConfig applied, but, again, only if host name collision prevention is off and the AuthConfigs are from different namespaces – likely this will never be the case for Kuadrant.

We can control the order of creation for each authconfig depending on how we traverse the tree so I hope it shouldn't be an issue.

It also would be interesting to see an example of the AuthConfig operations induced by each step into building the example above (IWO, what 'reconcile' means exactly in the algorithm), and them permutations of the steps (i.e. same steps in different order).

:+1:

rahulanand16nov commented 2 years ago

I just realized the algorithm doesn't even work for the example I wrote above :facepalm: but second iteration does.

if children nodes have a star node (`*`):
  if it has a `gateway-override`:
    just reconcile `gateway-override`;
    delete other nodes;
  elif it has `route-default`:
    reconcile `route-default`;
    reconcile(other nodes and delete `gateway-default` in them);
  else
    reconcile `gateway-default`;
    reconcile(other nodes);
else
  for each node: reconcile (node);

reconcile(node):
  reconcile in order of `gateway-override` > `route-default` > `gateway-default`;

Even Craig mentioned we need to isolate this logic and test the hell out of it.

Edit: I noticed that we should also move spec.rules inside default and override because if an authscheme is overridden by another authscheme then when rules are kind of redundant.

Probably make sense to start talking about merge case now.

cc @maleck13 @guicassolato

guicassolato commented 2 years ago

@rahulanand16nov, here's my take on the algorithm to reconcile an AuthPolicy for a particular host.

I took into account 8 types of AuthPolicies (i.e. 8 different cases), based on the 3 aspects (variables) of the AuthPolicy.

Variables:

target → Gateway | HTTPRoute
host → \ | \
scheme → default | override

Cases (2 targets ⅹ 2 hosts x 2 schemes = 8):

GWD: Gateway \ default
GWO: Gateway \ override
GLD: Gateway \ default
GLO: Gateway \ override
RWD: HTTPRoute \ default
RWO: HTTPRoute \ override
RLD: HTTPRoute \ default
RLO: HTTPRoute \ override

Note: I find useful to have RLD and RLO both supported, e.g. for the use case where admins can define RLO policies, while devs are restricted to RLD.

Precedence: GWD < RWD < GLD < RLD < GWO < RWO < GLO < RLO

Description of the algorithm: reconcile(kap, host) checks if another authpolicy exists, of any type that takes precedence over the resource being reconciled, targeting the same host (wildcard overlaps included); if thats’s the case, it rejects the resource for this particular host. Otherwise, it updates the status of any existing resource that would be superseded, and deletes/creates/updates the authconfigs accordingly

Pseudocode:

PolicyType = enum{GWD, RWD, GLD, RLD, GWO, RWO, GLO, RLO}

func reconcile(kap AuthPolicy, host string) error
  var supersededTypes []PolicyType

  switch kap.PolicyType()
    case GWD:
      if existOthers(kap, host, []PolicyType{RWD, GWO, RWO})
        return updateAuthPolicyStatus("%host superseded", kap)
      supersededTypes = []PolicyType{GWD}
    case RWD:
      if existOthers(kap, host, []PolicyType{GWO, RWO})
        return updateAuthPolicyStatus("%host superseded", kap)
      supersededTypes = []PolicyType{GWD, RWD}
    case GWO:
      if existOthers(kap, host, []PolicyType{RWO})
        return updateAuthPolicyStatus("%host superseded", kap)
      supersededTypes = []PolicyType{GLD, RLD, GWO}
    case RWO:
      supersededTypes = []PolicyType{GWD, RWD, GLD, RLD, GWO, RWO}
    case GLD:
      if existOthers(kap, host, []PolicyType{RLD, GWO, RWO, GLO, RLO})
        return updateAuthPolicyStatus("%host superseded", kap)
      supersededTypes = []PolicyType{GLD}
    case RLD:
      if existOthers(kap, host, []PolicyType{GWO, RWO, GLO, RLO})
        return updateAuthPolicyStatus("%host superseded", kap)
      supersededTypes = []PolicyType{GLD, RLD}
    case GLO:
      if existOthers(kap, host, []PolicyType{RLO})
        return updateAuthPolicyStatus("%host superseded", kap)
      supersededTypes = []PolicyType{GLD, RLD, GLO}
    case RLO:
      supersededTypes = []PolicyType{GLD, RLD, GLO, RLO}

    # somehow as an atomic transaction
    supersededAuthPolicies := findOthers(kap, host, supersededTypes)
    updateAuthPolicyStatus("%host superseded", supersededAuthPolicies...)
    deleteAuthConfigs(hosts(supersededAuthPolicies, host) - host)
    createOrUpdateAuthConfig(host, kap.AuthScheme)

func findOthers(kap AuthPolicy, host string, policyTypes []PolicyType) []AuthPolicy
  return []AuthPolicy{ other authpolicies targeting the same host (wildcard overlaps included) and whose type is in 'policyTypes' }

func existOthers(kap AuthPolicy, host string, policyTypes []PolicyType) bool
  return len(findOthers(kap, host, policyTypes)) > 0

guicassolato commented 2 years ago

Updating my own suggestion here, so it's consistent with the Gateway API (GW-API) spec.

I said before:

Precedence: GWD < RWD < GLD < RLD < GWO < RWO < GLO < RLO

GW-API establishes that the hierarchy (which I called "precedence") for the different types of resources should be exactly the reverse between defaults and overrides.

gw-api-defaults-overrides-hierarchy

Given our 4 types of resources:

GW: Gateway \
GL: Gateway \
RW: HTTPRoute \
RL: HTTPRoute \

The hierarchy in our case should actually be:

GW[D] < RW[D] < GL[D] < RL[D] < RL[O] < GL[O] < RW[O] < GW[O]

(Now noting default (D) and override (B) within square brackets.)

Rewriting the above with inverted inequality symbol (to match how it's represented in the GW-API docs):

GW[O] > RW[O] > GL[O] > RL[O] > RL[D] > GL[D] > RW[D] > GW[D]

Finally, the updated pseudocode for the hierarchy above:

Warning! I haven't tested this.

PolicyType = enum{GWO, RWO, GLO, RLO, RLD, GLD, RWD, GWD}

// reconcile(kap, host) reconciles a policy for a particular host.
// It looks for other policies targeting the same host (wildcard overlaps included), that are higher in the hierarchy and
// that according to the rules would be favoured to protect the host. If that is the case, it rejects the resource; otherwise,
// it updates the status of any existing policy that would be superseded by the resource regarding the the host, and
// deletes/creates/updates the authconfigs accordingly.
//
// Hierarchy:
// GWO > RWO > GLO > RLO > RLD > GLD > RWD > GWD
//  
// General rules:
// 1. **Older policy rule:** Between policies of the same type, targeting the same host, the older policy (created first) wins, i.e. older (existing) supersedes newer (challenging).
// 2. **Strict subset rule:** Literal hosts (L) never supersede wildcards (W), because L ⊂ W; they can leave side by side (different AuthConfigs).
// 3. **Default rule:** Defaults (D) only supersede other defaults (lower in the hierarchy).
// 4. **Override rule:** Overrides (O) supersede everything that is lower in the hierarchy, as long as it doesn't violate any of the other rules.
//
// Combined rules:
// - xWO rule:
//   - supersedes all types that are lower in the hierarchy (rule 4)
//   - is superseded by older self and all xWO that are higher in the hierarchy (rules 1, 2, 3 and 4)
// - xLO rule:
//   - supersedes all xLD and xLO that are lower in the hierarchy (rules 2 and 4)
//   - is superseded by older self and all xWO and xLO that are higher in the hierarchy (rules 1, 3 and 4)
// - xLD rule:
//   - supersedes all xLD that are lower in the hierarchy (rules 2 and 3)
//   - is superseded by older self and all types that are higher in the hierarchy (rules 1, 3 and 4)
// - xWD rule:
//   - supersedes all xWD that are lower in the hierarchy (rule 3)
//   - is superseded by older self and all xWD and xWO that are higher in the hierarchy (rules 1, 2, 3 and 4)
func reconcile(kap AuthPolicy, host string) error {
    var supersededTypes []PolicyType

    switch kap.PolicyType() {
    case GWO:
        if existOthers(kap, host, []PolicyType{GWO}) {
            return updateAuthPolicyStatus("cannot apply policy for %host", kap)
        }
        supersededTypes = []PolicyType{RWO, GLO, RLO, RLD, GLD, RWD, GWD}
    case RWO:
        if existOthers(kap, host, []PolicyType{GWO, RWO}) {
            return updateAuthPolicyStatus("cannot apply policy for %host", kap)
        }
        supersededTypes = []PolicyType{GLO, RLO, RLD, GLD, RWD, GWD}
    case GLO:
        if existOthers(kap, host, []PolicyType{GWO, RWO, GLO}) {
            return updateAuthPolicyStatus("cannot apply policy for %host", kap)
        }
        supersededTypes = []PolicyType{RLO, RLD, GLD}
    case RLO:
        if existOthers(kap, host, []PolicyType{GWO, RWO, GLO, RLO}) {
            return updateAuthPolicyStatus("cannot apply policy for %host", kap)
        }
        supersededTypes = []PolicyType{RLD, GLD}
    case RLD:
        if existOthers(kap, host, []PolicyType{GWO, RWO, GLO, RLO, RLD}) {
            return updateAuthPolicyStatus("cannot apply policy for %host", kap)
        }
        supersededTypes = []PolicyType{GLD}
    case GLD:
        if existOthers(kap, host, []PolicyType{GWO, RWO, GLO, RLO, RLD, GLD}) {
            return updateAuthPolicyStatus("cannot apply policy for %host", kap)
        }
    case RWD:
        if existOthers(kap, host, []PolicyType{GWO, RWO, RWD}) {
            return updateAuthPolicyStatus("cannot apply policy for %host", kap)
        }
        supersededTypes = []PolicyType{GWD}
    case GWD:
        if existOthers(kap, host, []PolicyType{GWO, RWO, RWD, GWD}) {
            return updateAuthPolicyStatus("cannot apply policy for %host", kap)
        }
    }

    // somehow as an atomic transaction
    {
        supersededAuthPolicies := findOthers(kap, host, supersededTypes)
        updateAuthPolicyStatus("can no longer apply policy for %host - now taken by %kap", supersededAuthPolicies...)
        deleteAuthConfigs(hosts(supersededAuthPolicies, host) - host)
        createOrUpdateAuthConfig(host, kap.AuthScheme)
    }
}

func findOthers(kap AuthPolicy, host string, policyTypes []PolicyType) []AuthPolicy {
    return []AuthPolicy{ other authpolicies targeting the same host (wildcard overlaps included) and whose type is in 'policyTypes' }
}

func existOthers(kap AuthPolicy, host string, policyTypes []PolicyType) bool {
    return len(findOthers(kap, host, policyTypes)) > 0
}

didierofrivia commented 2 years ago

Ok, here goes my 2c regarding this topic…

It’s an effort that aims to make the algorithm as simple as possible, redefining the precedence (hierarchy) order and the actual “compilation” of the policy rules. It aspires to also unify and be applied to the Rate Limit Policies

Hierarchy

It’s based on the GW API and Gui’s take, with some changes:

GL[O] > GW[O] > RL[O] > RW[O] > RL[D] > GL[D] > RW[D] > GW[D]

My understanding is that if you have a GW Literal Override, means that the level of specificity that the Cluster Admin had in mind needs to overtake any wildcard that most likely are general rules, therefore would always have more weight than any less specific one. Which would be the opposite when it comes to Defaults.

Considerations

One Auth/RL Policy pero Networking Object. First applied, others rejected.
Defining the combination scheme:
- Append
- Replace
- Merge
Ideally it would be an identifier per policy rule applied, which would make it simpler if merge is needed.

Reconciliation Algorithm

Check if the incoming policy target has already a policy attached, if so, ditch it.
Get all the policies
Sort them based on their hierarchy/precedence
Apply the combination scheme to obtain a single policy to apply. Then depending on the scheme: a. Append: the resulting policy will add every other into a single one b. Replace: The highest hierarchy wins. c. Merge: Based on their order, the merging algorithm will replace and combine accordingly.
Profit!

I know it’s a bit simplistic, and might be even naive… but hopefully this will help to craft a good algorithm that could be applied for both types of policies (RL/Auth) The most difficult, evidently, is the merge approach… but there are good libraries in Golang, such as Mergo, that could help us out.

eguzki commented 2 years ago

My contribution:

hostnames for a given authpolicy are read from spec.authScheme.hosts. This spec.authScheme.hosts is optional, if not set, the hostnames are read from the network resource the KAP is targeting. Policy rules should not play any role defining the hostnames.
I would not define any order of preference between "literal" and "wilcard" per se. For example, two policies of type R[O], one for *.com and the other petstore.com, they both apply. Authorino implements longest path match, so both policies would be enforced depending on the workload. They can live side by side in two different authconfigs. Thus, the hierarchy would be straightforward from the Gateway API def: GO > RO > RD > GD
How kuadrant should proceed when there are conflicts (overlapping domains from different policies)

Higher precedence	Lower precedence	Applied policies
`*.example.com`	`*.com`	`.example.com`, `.com`
`*.com`	`*.example.com`	`.example.com`(modified), `.com`

Authconfig schemes are not considered as units. Merge strategy can be applied on each authconfig section (patterns, conditions, identity, metadata, authorization, response, denywith). In the previous table, when *.com has higher preference than *.example.com, the policy *.example.com would still apply, however previously it would had been modified merging with higher level policies. If there is a conflict, the one with higher preference will prevail. A request to a.example.com would hit the modified version of the policy and a request to other.com would hit the unmodified version of the *.com policy.

Surely I am missing tons of things, but food for thought is always welcome (I guess).

guicassolato commented 2 years ago

@eguzki, as just discussed offline.

Policy rules should not play any role defining the hostnames.

I believe the policy rules should play a role to the host names in the AuthConfig, otherwise we enable a situation where the AuthConfig covers more that what it suppose to, e.g.:

kind: HTTPRoute:
spec:
  hostnames: [`*.petstore.com`]
---
kind: AuthPolicy
spec:
  rules:
  - hosts: ["api.petstore.com"]

We don't want *.petstore.com in the AuthConfig for the above, just like IMHO we don't want it in the Istio AuthorizationPolicy.

This spec.authScheme.hosts is optional

Maybe authpolicy.spec.authScheme.hosts should not even be an option. authconfig.spec.hosts makes sense and needs to remain required, but the abstraction on top (i.e. the KAP) can read the hosts from the rules and from the network object, as described before.

I can't see any use case where the user would want the ext-authz service triggered but have corresponding AuthConfig (effectively resulting in a 404 Not Found in this case). Perhaps to "lock" the backend service via authorization layer? E.g.

kind: HTTPRoute:
spec:
  hostnames: [`*.company.com`]
---
kind: AuthPolicy
spec:
  rules:
  - hosts: ["*.petstore.company.com"]
  authScheme:
    hosts: ["api.petstore.company.com"]

Meaning:

All requests to *.petstore.company.com trigger the authz service
Requests to api.petstore.company.com passes depending on the evaluation of the AuthConfig
All requests to other subdomains of *.petstore.company.com are denied by the authz service (with a 404)

I would not define any order of preference between "literal" and "wilcard" per se.

That was the same reasoning that previously made me think on RW[O] > GL[O], i.e. a route winning over a gateway in the hierarchy. However, I too was thinking that, in Authorino, both can co-exist, because one is specific (literal), while the other is generic (wildcard).

I will try later to rethink this so it reflects the correct hierarchy between the different types of resources as specified in GW-API and pointed out by @didierofrivia (i.e. Gateway override wins over HTTPRoute override, and the other way around for defaults), and Authorino's specificity that both policies can co-exist and most specific wins.

alexsnaps commented 2 years ago

I think I'll come somewhat out of left field on this but: Given the Gateway API's mandate that:

policy resources may omit Override or Default fields, but at least one of them MUST be present.

I'd like to propose a more "aggressive" approach and go for one or the other and build from there… If we really want to use the policy attachment concept, we don't really have a choice today anyways… Given the spec's:

GW[O] > RW[O] > GL[O] > RL[O] > RL[D] > GL[D] > RW[D] > GW[D]

I'd be tempted to start out with Default, so that RL wins. And iterate from there… Today that Default at the RL could completely replace whatever a GW policy would declare. Which raises interesting in itself:

how do I disable a RLP at my RL that was inherited from the GW, not make it some arbitrary large number, but completely remove it?.
how would one declare that a RLP has to be present on all RL?
how would one mitigate a DDoS?
...

I don't think that Defaults only will suffice to solve all of the use cases we're coming up with, but it'd be interesting to see how far we can get with "only" Defaults and what problem we face along the way…

So, not to invalidate any of what's been said so far, nor to pun on Overrides, but rather take an applied hands-on approach to one of them and "figure things out as we go"… wdyt?

guicassolato commented 6 months ago

Superseded by https://github.com/Kuadrant/kuadrant-operator/issues/431