BCDevOps / developer-experience

This repository is used to track all work for the BCGov Platform Services Team (This includes work for: 1. Platform Experience, 2. Developer Experience 3. Platform Operations/OCP 3)
Apache License 2.0
8 stars 17 forks source link

Investigate options to protect nodes from deletion #4346

Closed StevenBarre closed 10 months ago

StevenBarre commented 1 year ago

Describe the issue Need to prevent nodes being accidently deleted

What is the Value/Impact? Cluster safety

What is the plan? How will this get completed? Brainstorm ideas on how to protect nodes and how to safely test it

Identify any dependencies

Definition of done A plan for how to test and what to test

StevenBarre commented 1 year ago

Matt had the idea to use Kyverno policies to restrict node deletion to only a special service account.

Looking into using the DXC AWS account to spin up a UPI cluster where we can test this safely. https://docs.openshift.com/container-platform/4.12/installing/installing_aws/installing-aws-user-infra.html

Put in a ticket to get a subdomain delegated to the AWS Route 53.

StevenBarre commented 11 months ago

DNS set up by DXC corporate, I can now make as many test clusters as I want under *.mcs.dxcas.com.

Stood up a test cluster and logged in. Installed Kyverno, then deleted all the nodes. Cleaned up.

Next steps are to automate the build a bit more.

StevenBarre commented 11 months ago

Ansible playbook in progress

StevenBarre commented 11 months ago

https://www.giantswarm.io/blog/restricting-cluster-admin-permissions

StevenBarre commented 11 months ago

Tried that, doesnt work. oc delete --all does a get and then deletes each item individually

StevenBarre commented 11 months ago

This makes it a two step process to delete nodes, and with two different accounts, reducing the chance of a mass deletion.

---
kind: ServiceAccount
apiVersion: v1
metadata:
  name: advsol-node-deleter
  namespace: openshift
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: advsol-node-labeler
rules:
- apiGroups:
  - ""
  resources:
  - nodes
  verbs:
  - watch
  - list
  - get
  - update
  - patch
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: advsol-node-deleter
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: advsol-node-labeler
subjects:
  - kind: ServiceAccount
    name: advsol-node-deleter
    namespace: openshift
---
apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
  name: protect-nodes
  annotations:
    policies.kyverno.io/title: Block deletes of nodes
    policies.kyverno.io/category: Safety
    policies.kyverno.io/subject: Nodes
    policies.kyverno.io/description: >-
      Prevent accidental deletes of nodes
spec:
  background: false
  rules:
  - exclude:
      any:
      - resources:
          selector:
            matchLabels:
              ok-to-delete: "true"
    match:
      any:
      - resources:
          kinds:
          - Node
    name: protect-nodes
    validate:
      deny:
        conditions:
          any:
          - key: '{{request.operation}}'
            operator: Equals
            value: DELETE
      message: Deleting {{request.oldObject.kind}}/{{request.oldObject.metadata.name}}
        is not allowed
  - exclude:
      any:
      - subjects:
        - kind: ServiceAccount
          name: advsol-node-deleter
          namespace: openshift
    match:
      any:
      - resources:
          kinds:
          - Node
    name: prevent-label-value-changes
    validate:
      deny:
        conditions:
          all:
          - key: '{{ request.object.metadata.labels."ok-to-delete" || "" }}'
            operator: NotEquals
            value: ""
          - key: '{{ request.object.metadata.labels."ok-to-delete" || "" }}'
            operator: NotEquals
            value: '{{ request.oldObject.metadata.labels."ok-to-delete" || "" }}'
      message: Modifying the `ok-to-delete` label on a Node is not allowed.
  validationFailureAction: Enforce
StevenBarre commented 11 months ago