FairwindsOps / rbac-manager

A Kubernetes operator that simplifies the management of Role Bindings and Service Accounts.
https://fairwinds.com
Apache License 2.0
1.48k stars 117 forks source link

SA creating-deleting loop on OpenShift/OKD #417

Closed eryalito closed 1 year ago

eryalito commented 1 year ago

What happened?

When a RBACDefinition is created with a SA as a subject and it has to create it, in OpenShift clusters, it is deleted and recreated in each reconcile cycle

What did you expect to happen?

The SA should be created just once

How can we reproduce this?

Creating a RBACDefinition granting any Role to a SA in a namespace on Openshift/OKD cluster

Steps:

  1. Create the namespace
    kubectl create ns test
  2. Create the following RBACDefinition
    apiVersion: rbacmanager.reactiveops.io/v1beta1
    kind: RBACDefinition
    metadata:
    name: rbd-test
    rbacBindings:
    - name: test
    roleBindings:
    - clusterRole: admin
    namespaceSelector:
      matchExpressions:
      - key: kubernetes.io/metadata.name
        operator: In
        values:
        - test
    subjects:
    - kind: ServiceAccount
    name: test-sa
    namespace: test
  3. Create the namespace
    kubectrl create ns test

On the namespace:

$ k get sa -n test
NAME       SECRETS   AGE
builder    1         41s
default    1         41s
deployer   1         41s
test-sa    1         0s

The logs in a loop:

INFO[0135] Creating Service Account: test-sa            
INFO[0135] Deleting Service Account test-sa             
INFO[0135] Creating Service Account: test-sa            
INFO[0135] Deleting Service Account test-sa             
INFO[0135] Creating Service Account: test-sa

Version

1.6.4

Search

Code of Conduct

Additional context

Openshift/OKD injects a default global pull secret to all service accounts. So in each reconcile iteration the desired pullsecrets and the existing ones would never match.

https://github.com/FairwindsOps/rbac-manager/blob/master/pkg/reconciler/matcher.go#L57-L60

The current workaround is to manually create the SA before creating the RBACDefinition. Doing so, as the matcher never detects the SA as already existing it's always trying to create it, failing and raising errors:

https://github.com/FairwindsOps/rbac-manager/blob/master/pkg/reconciler/reconciler.go#L214

The behaviour is similar to https://github.com/FairwindsOps/rbac-manager/issues/110, but probably different root cause.

sudermanjr commented 1 year ago

Interesting. I'm not overly familiar with OKD/Openshift myself, so I wasn't aware it did this.

Do you have thoughts on what the best solution might be? I know we have some work to do around various enhancement proposals for service accounts, but I'm not sure what the "right" way to do this is.

eryalito commented 1 year ago

I'm not really sure what a good approach would be as there's no special annotation or label on OKD/OCP SAs. This is how a fresh SA looks like:

apiVersion: v1
imagePullSecrets:
- name: test-sa-dockercfg-kb4bg
kind: ServiceAccount
metadata:
  name: test-sa
  namespace: test
secrets:
- name: test-sa-dockercfg-kb4bg

A default imagePullSecret is injected on SA creation.

Maybe a flag can be defined to indicate it is a OKD/OCP cluster and change the pullsecret matching behavior. I have two things in mind that could work:

  1. Ignore imagePullSecret if the flag is set to true, so the SA match code just check the object metadata.
  2. Keep the imagePullSecret values but if the flag is set to true and the existing SA secrets are exactly the same plus an extra one and assume it's the injected one.

I see problems in both options, but at least it would work on OKD/OCP

sudermanjr commented 1 year ago

I think 2 is the better option - it allows users to still utilize the imagePullSecret functionality in rbac-manager while also supporting OKD. In the scenario, it doesn't even necessarily need to be a flag, we could just be default ignore any pull secrets that aren't managed by rbac-manager.

The real question at that point is how to implement that, but it should be possible

eryalito commented 1 year ago

I agree with the idea of setting it as the default behavior, it should be also backwards compatible.

The problem I see with that is when deleting a imagePullSecret on the RBD object, so there's no way of knowing if it was an unmanaged secret or just a deleted one -in that case, it makes sense to me it should be deleted from the SA too-.

What do you think about keeping track of the list of ¨managed" imagePullSecrets on an annotation on the SA? Example:

---
apiVersion: rbacmanager.reactiveops.io/v1beta1
kind: RBACDefinition
metadata:
  name: rbd-test
rbacBindings:
- name: test
  roleBindings:
  - clusterRole: admin
    namespaceSelector:
      matchExpressions:
      - key: kubernetes.io/metadata.name
        operator: In
        values:
        - test
  subjects:
  - kind: ServiceAccount
    name: test-sa
    namespace: test
    imagePullSecrets:
      - my-declared-ps

Will create:

---
apiVersion: v1
imagePullSecrets:
- name: my-declared-ps
- name: test-sa-dockercfg-kb4bg
kind: ServiceAccount
metadata:
  name: test-sa
  namespace: test
  annotations:
    rbacmanager.reactiveops.io/managed-pull-secrets: my-declared-ps
secrets:
- name: test-sa-dockercfg-kb4bg

In this case, when the pullsecret my-declared-ps is deleted, the logic can use the annotation info to know it has changed. Sadly, right now the overall logic with something changes is to delete the SA and create it from scratch. For now I think we can keep that logic and just detect the case to recreate the service account and tackle the delete & create topic in another issue.

If that makes sense I can draft a PR implementing that logic

sudermanjr commented 1 year ago

That sounds great to me, and a good plan for incremental improvement.

eryalito commented 1 year ago

Hey @sudermanjr I've created a PR with the changes we've talked about and also added a couple of extra tests. Could you check it out? Some CI workflows seems to be stuck

sudermanjr commented 1 year ago

Saw the PR, haven't had a chance to review in depth yet. Will take me a few more days most likely. Thanks!