Closed StevenBarre closed 10 months ago
Matt had the idea to use Kyverno policies to restrict node deletion to only a special service account.
Looking into using the DXC AWS account to spin up a UPI cluster where we can test this safely. https://docs.openshift.com/container-platform/4.12/installing/installing_aws/installing-aws-user-infra.html
Put in a ticket to get a subdomain delegated to the AWS Route 53.
DNS set up by DXC corporate, I can now make as many test clusters as I want under *.mcs.dxcas.com
.
Stood up a test cluster and logged in. Installed Kyverno, then deleted all the nodes. Cleaned up.
Next steps are to automate the build a bit more.
Ansible playbook in progress
Tried that, doesnt work. oc delete --all
does a get and then deletes each item individually
This makes it a two step process to delete nodes, and with two different accounts, reducing the chance of a mass deletion.
---
kind: ServiceAccount
apiVersion: v1
metadata:
name: advsol-node-deleter
namespace: openshift
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: advsol-node-labeler
rules:
- apiGroups:
- ""
resources:
- nodes
verbs:
- watch
- list
- get
- update
- patch
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: advsol-node-deleter
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: advsol-node-labeler
subjects:
- kind: ServiceAccount
name: advsol-node-deleter
namespace: openshift
---
apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
name: protect-nodes
annotations:
policies.kyverno.io/title: Block deletes of nodes
policies.kyverno.io/category: Safety
policies.kyverno.io/subject: Nodes
policies.kyverno.io/description: >-
Prevent accidental deletes of nodes
spec:
background: false
rules:
- exclude:
any:
- resources:
selector:
matchLabels:
ok-to-delete: "true"
match:
any:
- resources:
kinds:
- Node
name: protect-nodes
validate:
deny:
conditions:
any:
- key: '{{request.operation}}'
operator: Equals
value: DELETE
message: Deleting {{request.oldObject.kind}}/{{request.oldObject.metadata.name}}
is not allowed
- exclude:
any:
- subjects:
- kind: ServiceAccount
name: advsol-node-deleter
namespace: openshift
match:
any:
- resources:
kinds:
- Node
name: prevent-label-value-changes
validate:
deny:
conditions:
all:
- key: '{{ request.object.metadata.labels."ok-to-delete" || "" }}'
operator: NotEquals
value: ""
- key: '{{ request.object.metadata.labels."ok-to-delete" || "" }}'
operator: NotEquals
value: '{{ request.oldObject.metadata.labels."ok-to-delete" || "" }}'
message: Modifying the `ok-to-delete` label on a Node is not allowed.
validationFailureAction: Enforce
Describe the issue Need to prevent nodes being accidently deleted
What is the Value/Impact? Cluster safety
What is the plan? How will this get completed? Brainstorm ideas on how to protect nodes and how to safely test it
Identify any dependencies
Definition of done A plan for how to test and what to test