AKS issue for taint removal

pedroamaralm commented 2 years ago

What happened:

Hi, until last month I was able to remove the taint from my nodes but since then I can't anymore, I haven't made any changes, either in the nodes or in politics, I just can't.

The following problem occurs:

p@p:~$ kubectl taint nodes aks-lxnode1-xxxxxxx-vmss00000y kubernetes.azure.com/scalesetpriority=spot:NoSchedule- Error from server: admission webhook "aks-node-validating-webhook.azmk8s.io" denied the request: (UID: XXXXXX-XXXXX-XXXX-XXXX) Taint delete request "kubernetes.azure.com/scalesetpriority=spot:NoSchedule" refused. User is attempting to delete a taint configured on aks node pool "lxnode1".

What you expected to happen:

If I use this command on the cluster I have elsewhere I can make it work normally

k8s@node1:~$ kubectl taint nodes node2 kubernetes.azure.com/scalesetpriority=spot:NoSchedule node/node2 tainted k8s@node1:~$ kubectl taint nodes node2 kubernetes.azure.com/scalesetpriority=spot:NoSchedule- node/node2 untainted

How to reproduce it (as minimally and precisely as possible):

kubectl taint nodes aks-lxnode1-xxxxxxxx-vmss00000y kubernetes.azure.com/scalesetpriority=spot:NoSchedule-

Anything else we need to know?:

I use spot.io to manage spot instances and when the instance is created it is born with the taint In the Schedule, we have the job that removed this taint but now it doesn't remove it anymore and as a user we can't remove it, I also have a rancher installed on it cluster where I'm Administrator but that still won't let me remove the taint. This problem is not related to spot.io or the machines being spot because I uploaded a new cluster from scratch, with on-demand machines and after inserting the taint I couldn't remove it either.

ghost commented 2 years ago

Hi pedroamaralm, AKS bot here :wave: Thank you for posting on the AKS Repo, I'll do my best to get a kind human from the AKS team to assist you.

I might be just a bot, but I'm told my suggestions are normally quite good, as such: 1) If this case is urgent, please open a Support Request so that our 24/7 support team may help you faster. 2) Please abide by the AKS repo Guidelines and Code of Conduct. 3) If you're having an issue, could it be described on the AKS Troubleshooting guides or AKS Diagnostics? 4) Make sure your subscribed to the AKS Release Notes to keep up to date with all that's new on AKS. 5) Make sure there isn't a duplicate of this issue already reported. If there is, feel free to close this one and '+1' the existing issue. 6) If you have a question, do take a look at our AKS FAQ. We place the most common ones there!

atykhyy commented 2 years ago

This issue also interferes with deploying and scaling AKS clusters which use Cilium for networking, as Cilium (and likely other similar solutions) needs a taint on fresh nodes to prevent pods from being scheduled there until Cilium configures the fresh node.

ghost commented 2 years ago

Triage required from @Azure/aks-pm

allyford commented 2 years ago

Hi @pedroamaralm, Thank you for your feedback. This is a known issue that only occurs when a non-AKS request to change a node taint that is set via the AKS api is attempted. Could you provide a bit more information to help us understand your use case?

Instead of using the AKS api and then changing the taints on the node, you can use an admission controller to add taints to new node objects.

do0ominik commented 2 years ago

Hi @allyford , we are facing the same issue with our aks. We use spot nodepools for our testing environment and don't want to add Affinity-Rules (like mentioned here) for this purpose, otherwise we need to maintain different configuration for different stages (production, testing, dev, ..).

Until yesterday, the taint removal worked like a charm. As of today, our cluster was broken (pods couldn't be scheduled any more...). Our aks is deployed to "West Europe".

How is this supposed to work in the future. Do we need to have affinity-rules for that? Can we remove taints again?

Cheers!

lkjangir commented 2 years ago

Until yesterday, It was working fine. I have a daemon-set installed in my AKS clusters which takes care of removing spot Taints from all nodes. As of today, its failing on all the clusters. There's no communication from microsoft regarding this.

brudnyhenry commented 2 years ago

Hi, AKS cluster is unusable for us. All our test env run on spot instances so now we are blocked. Was this change announced anywhere?

allyford commented 2 years ago

Hi all,

Thank you for all of your questions and feedback.

A couple solutions here for the above issues:

Removal of Taints: Taints can be removed via the AKS api
Need taints on node at init time to avoid pod schedule: Use a mutating admission webhook instead of relying on using the AKS API and kubectl API to give conflict order: Kubernetes Docs . Mutating admission webhooks can be invoked first, and can modify node objects sent to the API server to enforce custom defaults.
Removal of spot pool taint on spot pool: Spot pool taints are an AKS system taint and cannot be modified or removed.

This behavior change was listed in the 2022-04-24 AKS Release notes .

alkarp commented 2 years ago

this is a breaking change for us too. clusters where we run cilium on are now unusable.

cilium uses the taints to ensure the cilium agent pods are deployed before any other workload and are using the cilium cni:

it does not support mutating admission webhooks;
it does not support bring your own cni on the aks;

extremely disappointing.

JRolfe-Gen commented 2 years ago

What is the purpose of this feature? It takes away key functionality of k8s. I don't understand what direction you are trying to take this and how taking this away is benefiting users of AKS. Is there any plans to make it so that taints can be controlled down the individual node? Managing taints at the nodepool level is a terrible idea.

akostic-kostile commented 2 years ago

I just fonud out about this change when our production deployment failed due to Cilium not being able to remove the taint any longer. Cluster was provisioned using Terraform and taint was set in the same way.

Cilium was chosen because it supports something both NetworkPolicy solutions that MS offers do not, DNS based egress policies. In this day and age I consider IP based network policies practically useless for anything that's residing outside the cluster.

hacst commented 2 years ago

This change is producing issues for us too. We use an initial taint on nodes in certain pools to prevent scheduling of pods on new nodes until required provisioning done by a daemon set on the node is completed and the taint removed. The mutating admission webhook sounds like a very complicated and brittle workaround for what used to be a simple, kubernetes native way of dealing with this.

gara-MI commented 2 years ago

Hello @hacst , I found a workaround this issue, I used https://github.com/open-policy-agent/gatekeeper from OPA. the installation of gatekeeper can be found here The commands I used:

kubectl apply -f https://raw.githubusercontent.com/open-policy-agent/gatekeeper/release-3.8/deploy/gatekeeper.yaml

cat <<EOF | kubectl apply -f -
apiVersion: mutations.gatekeeper.sh/v1beta1
kind: ModifySet
metadata:
  name: remove-node-taints
  namespace: gatekeeper
spec:
  location: "spec.taints"
  applyTo:
  - groups: [""]
    kinds: ["Node"]
    versions: ["v1"]
  parameters:
    operation: prune
    values:
      fromList:
        - effect: NoSchedule
          key: kubernetes.azure.com/scalesetpriority
          value: spot
EOF

I tested this on k8s version 1.21.7.

hacst commented 2 years ago

@gara-MI thanks. Unfortunately it is not really applicable to my use-case. Though it might be an option for people specifically wanting to get rid of the spot instance taint throughout the cluster. For us as the node-pool scales up/down we would have to add/remove these mutation resources dynamically for each node once it becomes ready/goes away to replicate what we had. Also we are not running OPA in our cluster.

gara-MI commented 2 years ago

Hi @hacst I see your use case is different, I think it's more useful for others where they are using daemonset to remove taints on spot nodes.

do0ominik commented 2 years ago

@gara-MI I think this will not work if you have azure policies enabled, because Azure manages gatekeeper and there are no "mutation"-Ressources deployed.

akostic-kostile commented 2 years ago

I managed to fix our Cilium clusters, here's a brief explanation how: https://github.com/cilium/cilium/issues/19788#issuecomment-1125110915

camposdelima commented 2 years ago

I think that a workaround more simple for most cases is to delete the aks-node-validating-webhook:

 kubectl delete ValidatingWebhookConfiguration aks-node-validating-webhook

But if you are more conservative, you can just disable:

 kubectl get ValidatingWebhookConfiguration aks-node-validating-webhook -o yaml | sed -e 's/\(objectSelector: \){}/\1{"matchLabels": {"disable":"true"}}/g' | kubectl apply -f -

To guarantee that modification will be permanent, I also have created a job to do the deactivation every minute and remove all spots node taints: https://gist.github.com/camposdelima/c77a4f23a9a831188b88ca67650cf011

akostic-kostile commented 2 years ago

@camposdelima Excelent find! Had I known about this I wouldn't have bothered with an admission controller.

palma21 commented 2 years ago

I think that a workaround more simple for most cases is to delete the aks-node-validating-webhook:
 kubectl delete ValidatingWebhookConfiguration aks-node-validating-webhook
But if you are more conservative, you can just disable:
 kubectl get ValidatingWebhookConfiguration aks-node-validating-webhook -o yaml | sed -e 's/$objectSelector: ${}/\1{"matchLabels": {"disable":"true"}}/g' | kubectl apply -f -
To guarantee that modification will be permanent, I also have created a job to do the deactivation every minute and remove all spots node taints: https://gist.github.com/camposdelima/c77a4f23a9a831188b88ca67650cf011

This is going to start failing on future releases FYI. You cannot disable system validating webhooks

What is the purpose of this feature? It takes away key functionality of k8s. I don't understand what direction you are trying to take this and how taking this away is benefiting users of AKS. Is there any plans to make it so that taints can be controlled down the individual node? Managing taints at the nodepool level is a terrible idea.

What functionality of k8s is this removing? You're not forced to managed taints at nodepool level, though i'd love to know more why it's a terrible idea. You cand add any taint via the k8s API to any individual node, and also remove it. What you can't do is add nodepool taints via the AKS API and then remove them of individual nodes via the k8s API.

tial taint on nodes in certain pools to prevent scheduling of pods on new nodes until required provisioning done by a daemon set on the node is completed and the taint removed. The mutating admission webhook sounds like a very complicated and brittle workaround for what used to be a simple, kubernetes native way of dealing with this.

If it was a native k8s way, you could do it all via the k8s API no? :) But in this case you were using the AKS api to set them, and the k8s API to remove them right? This causes conflicts at reconciliation time for the service.

palma21 commented 2 years ago

Listening to the feedback on this thread it seems all use cases that were using this prior behavior are along the lines of node initialization. We can add startup taints, passed to the kubelet in order to meet this use case. They will not be enforced by AKS, or passed on the AKS API but on the kubelet custom settings, and you can then remove them via k8s API call, because they won't be reconciled by the API. Does anyone have a use case that this wouldn't meet?

Otherwise we can provide this asap

hacst commented 2 years ago

@palma21 Having a AKS API element at agent pool creation that allows us to pass taints applied with node startup and modifiable through kubernetes api afterwards definitely re-enables our use-cases if that's what is being proposed. Though in our case we'll unfortunately have to wait for it to become available through the terraform provider as we do not talk directly to the AKS API.

With regards to the discussion before:

If it was a native k8s way, you could do it all via the k8s API no? :) But in this case you were using the AKS api to set them, and the k8s API to remove them right? This causes conflicts at reconciliation time for the service.

The API documentation on properties.nodeTaints in agent pool creation says The taints added to new nodes during node pool create and scale. For example, key=value:NoSchedule.. No idea why we would expect them to be anything AKS specific afterwards or reconciled in any way. Documentation on get is also consistent with that assumption. It says it returns The taints added to new nodes during node pool create and scale. For example, key=value:NoSchedule.. As an aside we did not appreciate this behavior change being rolled out on us everywhere without it being bound to the usual Kubernetes version updates. Of course with a managed service changes like this can happen sometimes but I am not sure fair consideration was given to the customer impact and notification here.

do0ominik commented 2 years ago

Listening to the feedback on this thread it seems all use cases that were using this prior behavior are along the lines of node initialization. We can add startup taints, passed to the kubelet in order to meet this use case. They will not be enforced by AKS, or passed on the AKS API but on the kubelet custom settings, and you can then remove them via k8s API call, because they won't be reconciled by the API. Does anyone have a use case that this wouldn't meet?

Otherwise we can provide this asap

I don't understand what you are planning to do here.

Our use case is: Schedule Pods on Spot nodes without the need of adding tolerations to each pod. For this we want to remove the "kubernetes.azure.com/scalesetpriority" taint of the regarding nodes at some point.

Is this solved in your solution?

EppO commented 2 years ago

Listening to the feedback on this thread it seems all use cases that were using this prior behavior are along the lines of node initialization. We can add startup taints, passed to the kubelet in order to meet this use case. They will not be enforced by AKS, or passed on the AKS API but on the kubelet custom settings, and you can then remove them via k8s API call, because they won't be reconciled by the API. Does anyone have a use case that this wouldn't meet?

Otherwise we can provide this asap

I think it would dramatically help the BYOCNI use case so that we let the custom CNI plugin to mark when a node is ready to accept workloads. It was the main challenge for cilium for a while.

palma21 commented 2 years ago

The API documentation on properties.nodeTaints in agent pool creation says The taints added to new nodes during node pool create and scale. For example, key=value:NoSchedule.. No idea why we would expect them to be anything AKS specific afterwards or reconciled in any way. Documentation on get is also consistent with that assumption. It says it returns The taints added to new nodes during node pool create and scale. For example, key=value:NoSchedule.. As an aside we did not appreciate this behavior change being rolled out on us everywhere without it being bound to the usual Kubernetes version updates. Of course with a managed service changes like this can happen sometimes but I am not sure fair consideration was given to the customer impact and notification here.

I meant that nodepool is an AKS-only concept not really a k8s concept, applying taints via the k8s route is the same anywhere (as is in AKS), but applying them to a "nodepool" is AKS functionality not something from k8s upstream, this was in regards to the comment on k8s upstream alignment.

Acknowledge on the change rollout, we saw this as a bugfix from the intentional behavior otherwise we would have def tie to versions and communicated, which we should have.

Schedule Pods on Spot nodes without the need of adding tolerations to each pod. For this we want to remove the "kubernetes.azure.com/scalesetpriority" taint of the regarding nodes at some point.

Yes and no, this is about having taints that the user can pass that won't be reconciled in runtime. The spot taint is a system taint to prevent, among other things system components to land on spot pools, it's not a taint that you can remove.

However, for cases, like spot, where you don't want to tolerate all pods, is typically the default tolerations come in: https://kubernetes.io/docs/reference/access-authn-authz/admission-controllers/#podtolerationrestriction https://docs.microsoft.com/en-us/azure/aks/faq#what-kubernetes-admission-controllers-does-aks-support-can-admission-controllers-be-added-or-removed

Does this feature not work for your use case?

AaronLinOops commented 2 years ago

Having independent control over the nodes in the pool is obviously a strong requirement. We are using the nodes in the pool, not doing the scheduling of the node pool. It is a very routine operation to prohibit nodes from being scheduled until they are officially put into use. This incompatible change led to catastrophic consequences.

pdefreitas commented 1 year ago

Any updates on providing "startup taints"? The current implementation broke functionality for our use-case. In our use-case we need to modify node IP tables within certain Kubernetes nodes. Pods may not initialize until these node IP tables are setup.

hacst commented 1 year ago

@palma21 Definitely also interested in an update on the "startup taints" topic. We keep having to do imperfect/fiddly workarounds and it keeps popping up in useful third party things. E.g. while looking into deploying AppArmor profiles today I stumbled over https://github.com/phealy/aks-apparmor-daemonset by your colleague @phealy which unless I'm mistaken also assumes the no longer working behavior.

samuel-form3 commented 1 year ago

Any updates on this ?

Aaron-ML commented 1 year ago

@palma21 Would love an update on this also, the way this is set up currently seems intentionally rigid.

davem-git commented 1 year ago

Any updated on this would be appreciated. is there a method working for anyone?

iRootkit commented 1 year ago

use the AKS managment REST API its working for me:

HTTP operation: PUT URL: https://management.azure.com/subscriptions/{{subscriptionId}}/resourceGroups/{{resourceGroup}}/providers/Microsoft.ContainerService/managedClusters/{{clusterName}}/agentPools/{{agentPoolName}}?api-version=2022-04-01

body: { "properties": { "nodeTaints" : [ "" ], "mode": "System" } }

but trought CLI should be possible delete taints at least.

karlschriek commented 1 year ago

I have to agree with most of the comments here, in that taints being set on NodePool level by die AZ resource manager is in my opinion a very poor design decision. I think the length to which people here are going to try to get rid of those taints also illustrates that quite clearly.

One of the main use cases for spot instances is to use them in development environments, where you consciously choose to have everything scheduled there, knowing that there might be disruption, but the overall cost will be a lot lower. For larger projects that have potentially hundreds of deployments it makes no sense to have to individually set tolerations on all of them (or even on namespace level with default tolerations), just so that they can schedule on the spot instances. Not to mention that such dev-specific configurations are usually things you want to avoid in a proper DevOps scenario. I.e., once you are happy with your manifests on dev you pull the trigger to deploy them on prod without any more changes (such as removing those tolerations again).

I presume the reason for the taints is to avoid system pods from scheduling there. Given that Azure even doubled down on this with by adding the validating webhook to prevent it from being removed, I have to assume that spot instance scheduling (or more specifically de-scheduling) runs into issues if there are system pods deployed there. Some system pods will refuse to be moved and cannot be drained, so if that is part of the spot instance scheduling process I could see that causing problems. I would point out though that if this is the reason, then taints still isn't the solution, since I can easily add a toleration to system pod and it will still get scheduled there...

Obviously I am just guessing above. But if the reason is simply "system pods on spot node bad for client", please Azure, at least give us the ability to consciously decide to toggle the taint to "off" when we deploy the node pool!

Kannibalenleiche commented 1 year ago

use the AKS managment REST API its working for me:

HTTP operation: PUT URL: https://management.azure.com/subscriptions/{{subscriptionId}}/resourceGroups/{{resourceGroup}}/providers/Microsoft.ContainerService/managedClusters/{{clusterName}}/agentPools/{{agentPoolName}}?api-version=2022-04-01

body: { "properties": { "nodeTaints" : [ "" ], "mode": "System" } }

but trought CLI should be possible delete taints at least.

This is curious - the spot taint could not be removed for me with this. Also all the other proposed "workarounds" are not doing it for me. The ability to run on spot-nodes would really help us.

davem-git commented 1 year ago

use the AKS managment REST API its working for me:

HTTP operation: PUT URL: https://management.azure.com/subscriptions/{{subscriptionId}}/resourceGroups/{{resourceGroup}}/providers/Microsoft.ContainerService/managedClusters/{{clusterName}}/agentPools/{{agentPoolName}}?api-version=2022-04-01

body: { "properties": { "nodeTaints" : [ "" ], "mode": "System" } }

but trought CLI should be possible delete taints at least.

@iRootkit What do you use to run this? do you have a cronjob checking?

iRootkit commented 1 year ago

use the AKS managment REST API its working for me: HTTP operation: PUT URL: https://management.azure.com/subscriptions/{{subscriptionId}}/resourceGroups/{{resourceGroup}}/providers/Microsoft.ContainerService/managedClusters/{{clusterName}}/agentPools/{{agentPoolName}}?api-version=2022-04-01 body: { "properties": { "nodeTaints" : [ "" ], "mode": "System" } } but trought CLI should be possible delete taints at least.

@iRootkit What do you use to run this? do you have a cronjob checking?

@davem-git bro i used postman to call management api, but whatever client its work for this

iRootkit commented 1 year ago

use the AKS managment REST API its working for me: HTTP operation: PUT URL: https://management.azure.com/subscriptions/{{subscriptionId}}/resourceGroups/{{resourceGroup}}/providers/Microsoft.ContainerService/managedClusters/{{clusterName}}/agentPools/{{agentPoolName}}?api-version=2022-04-01 body: { "properties": { "nodeTaints" : [ "" ], "mode": "System" } } but trought CLI should be possible delete taints at least.

This is curious - the spot taint could not be removed for me with this. Also all the other proposed "workarounds" are not doing it for me. The ability to run on spot-nodes would really help us.

@Kannibalenleiche in my case i was used to remove this taint CriticalAddonsOnly=true:NoSchedule for a system node pool

davem-git commented 1 year ago

@davem-git bro i used postman to call management api, but whatever client its work for this

I need this to be automated, as our clusters scale in an out. manually having to do anything isn't acceptable

cargious commented 1 year ago

@allyford @palma21 can we have an implementation timeline for what @palma21 proposed?

ghost commented 1 year ago

This issue has been automatically marked as stale because it has not had any activity for 60 days. It will be closed if no further activity occurs within 15 days of this comment.

Kannibalenleiche commented 1 year ago

use the AKS managment REST API its working for me: HTTP operation: PUT URL: https://management.azure.com/subscriptions/{{subscriptionId}}/resourceGroups/{{resourceGroup}}/providers/Microsoft.ContainerService/managedClusters/{{clusterName}}/agentPools/{{agentPoolName}}?api-version=2022-04-01 body: { "properties": { "nodeTaints" : [ "" ], "mode": "System" } } but trought CLI should be possible delete taints at least.

This is curious - the spot taint could not be removed for me with this. Also all the other proposed "workarounds" are not doing it for me. The ability to run on spot-nodes would really help us.

@Kannibalenleiche in my case i was used to remove this taint CriticalAddonsOnly=true:NoSchedule for a system node pool

@iRootkit this is not applicable for spot taint. Removing the CriticalAddonsOnly=true:NoSchedule is no problem in CLI too.

Thakurvaibhav commented 1 year ago

Is there any update on this? We are looking for a way to remove taints from a spot node pool.

Vipersoft-01 commented 1 year ago

To anyone regarding this issue. This is how I got the spot node pool working, WITH a concurrent system node pool running BUT ONLY paying for the spot node pool. Figured it out for the current version. (1.23.12)

Steps:

Create a System node pool
Create a Spot node pool
Create a manual scaling rule in the configurating of the VMSS of the System node pool (scale to 0) You will end up paying for those few seconds that the node started up before dying again. But it will be a fraction of the full price.
Remove taint on Spot node pool: => Either by running following manual commands in the cli
```
kubectl delete ValidatingWebhookConfiguration aks-node-validating-webhook
kubectl taint node NAMEOFSPOTNODEPOOL kubernetes.azure.com/scalesetpriority-
```
=> Or by creating a CronJob that does the job for you. (A CronJob uses the default serviceaccount user within your cluster, which has almost no rights by default. so this requires some more steps, but fully automates the process of un-tainting the spot node pool.)

---
apiVersion: batch/v1
kind: CronJob
metadata:
  name: spot-supervisor
spec:
  schedule: "*/1 * * * *"
  concurrencyPolicy: Replace
  jobTemplate:
    spec:
      backoffLimit: 1
      activeDeadlineSeconds: 60
      template:
        spec:
          restartPolicy: Never
          containers:
          - name: spot-supervisor
            image: kubesphere/kubectl:v1.0.0
            command: ["/bin/sh","-c"]
            args: 
              - >-
                kubectl delete ValidatingWebhookConfiguration aks-node-validating-webhook;
                kubectl taint node -l kubernetes.azure.com/scalesetpriority=spot kubernetes.azure.com/scalesetpriority=spot:NoSchedule- || echo "spot done";
          securityContext: {}
          tolerations:
            - effect: NoSchedule
              key: node-role.kubernetes.io/master
            - key: "kubernetes.azure.com/scalesetpriority"
              operator: "Equal"
              value: "spot"
              effect: "NoSchedule"
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: modify-namespace
rules:
  - apiGroups: ["admissionregistration.k8s.io",""]
    resources:
      - nodes
      - mutatingwebhookconfigurations
      - validatingwebhookconfigurations
    verbs:
      - create
      - delete
      - list

---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: modify-namespace-binding
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: modify-namespace
subjects:
- kind: ServiceAccount
  name: default
  namespace: NAMESPACE

Run the .yaml file within your cluster kubectl apply -f NAMEFILE.yaml (optional -n NAMESPACE)

Whenever youre spot node gets evicted due to the eviction policy, the moments it gets returned the CronJob will un-taint the new node.

Aprilllllll commented 1 year ago

Hi guys, You may use this command to remove the taint on a node pool: #az aks nodepool update --cluster-name --resource-group --name --node-taints "" Hope this help.

Vipersoft-01 commented 1 year ago

Hi guys, You may use this command to remove the taint on a node pool: #az aks nodepool update --cluster-name --resource-group --name --node-taints "" Hope this help.

This will not work by Default on Spot node pools in AKS.

davem-git commented 1 year ago

I've had to abandon setting taints by default and use kubmod to apply taints on startup. Its still in testing I haven't gone to production with it as of yet

NormanJS commented 1 year ago

@Vipersoft-01 Is there anything else I need to do to make that cron work? I created a new cluster and then applied that and the spot instances are still tainted, but running the command outside of the cron job works fine, despite the cron job showing up as 'completed'

NormanJS commented 1 year ago

Weird, looks like granting my service account full permissions worked. This is just for a kubernetes cluster I'm playing around with, so I don't really have a problem with too much perms, but it would be great if someone can investigate into which perms it requires. I'm using kubernetes ver 1.24.9

---
apiVersion: batch/v1
kind: CronJob
metadata:
  name: spot-supervisor
spec:
  schedule: "*/1 * * * *"
  concurrencyPolicy: Replace
  jobTemplate:
    spec:
      backoffLimit: 1
      activeDeadlineSeconds: 60
      template:
        spec:
          restartPolicy: Never
          containers:
          - name: spot-supervisor
            image: kubesphere/kubectl:v1.0.0
            command: ["/bin/sh","-c"]
            args: 
              - >-
                kubectl get ValidatingWebhookConfiguration aks-node-validating-webhook -o yaml | sed -e 's/\(objectSelector: \){}/\1{"matchLabels": {"disable":"true"}}/g' | kubectl apply -f -;
                kubectl taint node -l kubernetes.azure.com/scalesetpriority=spot kubernetes.azure.com/scalesetpriority=spot:NoSchedule- || echo "spot done";
          securityContext: {}
          tolerations:
            - effect: NoSchedule
              key: node-role.kubernetes.io/master
            - key: "kubernetes.azure.com/scalesetpriority"
              operator: "Equal"
              value: "spot"
              effect: "NoSchedule"

---

kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: full-permissions
rules:
- apiGroups: ["*"]
  resources: ["*"]
  verbs: ["*"]

---
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: full-permissions-binding
subjects:
- kind: ServiceAccount
  name: default
  namespace: default
roleRef:
  kind: ClusterRole
  name: full-permissions
  apiGroup: rbac.authorization.k8s.io

Vipersoft-01 commented 1 year ago

Weird, looks like granting my service account full permissions worked. This is just for a kubernetes cluster I'm playing around with, so I don't really have a problem with too much perms, but it would be great if someone can investigate into which perms it requires. I'm using kubernetes ver 1.28.9

---
apiVersion: batch/v1
kind: CronJob
metadata:
  name: spot-supervisor
spec:
  schedule: "*/1 * * * *"
  concurrencyPolicy: Replace
  jobTemplate:
    spec:
      backoffLimit: 1
      activeDeadlineSeconds: 60
      template:
        spec:
          restartPolicy: Never
          containers:
          - name: spot-supervisor
            image: kubesphere/kubectl:v1.0.0
            command: ["/bin/sh","-c"]
            args: 
              - >-
                kubectl get ValidatingWebhookConfiguration aks-node-validating-webhook -o yaml | sed -e 's/\(objectSelector: \){}/\1{"matchLabels": {"disable":"true"}}/g' | kubectl apply -f -;
                kubectl taint node -l kubernetes.azure.com/scalesetpriority=spot kubernetes.azure.com/scalesetpriority=spot:NoSchedule- || echo "spot done";
          securityContext: {}
          tolerations:
            - effect: NoSchedule
              key: node-role.kubernetes.io/master
            - key: "kubernetes.azure.com/scalesetpriority"
              operator: "Equal"
              value: "spot"
              effect: "NoSchedule"

---

kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: full-permissions
rules:
- apiGroups: ["*"]
  resources: ["*"]
  verbs: ["*"]

---
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: full-permissions-binding
subjects:
- kind: ServiceAccount
  name: default
  namespace: default
roleRef:
  kind: ClusterRole
  name: full-permissions
  apiGroup: rbac.authorization.k8s.io

Yhea, as I stated in my reply you are required to give the service account the proper rights to work with, as per default is has almost no rights. This is just a work around, I do hope this keeps working tho. As this keeps the cost of our test enviroment extremely low compared to the production one.

carloruiz commented 1 year ago

We can add startup taints, passed to the kubelet in order to meet this use case. They will not be enforced by AKS, or passed on the AKS API but on the kubelet custom settings, and you can then remove them via k8s API call, because they won't be reconciled by the API. Does anyone have a use case that this wouldn't meet?

This may not meet all the use cases here but it definitely meets many use cases (including ours). We also run on EKS and we solve it exactly the same way, by passing additional args to kubelet at node startup time. The change would be much appreciated!! @palma21

Azure / AKS

AKS issue for taint removal #2934