Closed kovaxur closed 1 year ago
hey @kovaxur, can you share the helm command you used? Sometimes the installation process fails because of some unset variable names. In addition, did you migrate from CAS or are doing some install on a fresh cluster with this guide?
Hi @njtran, it's a fresh install, just I'm familiar with CAS and it was easier to follow the guide.
I'm not using helm directly, I use pulumi to create the whole infra and this is also part of it, these are the parameters passed, basically same as with helm:
values: {
settings: {
aws: {
clusterName: options.eksCluster.name,
clusterEndpoint: options.eksCluster.eksCluster.endpoint,
tags: {
[`kubernetes.io/cluster/${options.eksCluster.name}`]: 'owned',
},
defaultInstanceProfile: options.defaultInstanceProfile.name,
},
},
serviceAccount: {
annotations: {
'eks.amazonaws.com/role-arn': `arn:aws:iam::${getStackConfig().accountId}:role/${prefixWithStackName(options.eksCluster.name)}-karpenter-controller-role`,
},
name: 'karpenter',
},
nodeSelector: {
'runSystem': 'yes',
},
tolerations: [{
effect: 'NoSchedule',
key: 'dedicated',
operator: 'Equal',
value: 'system',
}],
},
I tried to look into the code, but I don't get where the defaulting.webhook.karpenter.sh
webhook comes from.
This is where the webhook is defined for your installation, which includes the webhook that it says you don't have: https://github.com/aws/karpenter/blob/v0.27.0/charts/karpenter/templates/webhooks-core.yaml
It's odd that this isn't in your cluster. I just tried a new fresh install for v0.27.0 and see the webhook in my cluster.
Additionally, I used the helm template
command here and see the webhook in the generated file.
I'm not familiar with pulumi, so is it possible you removed this webhook at one point? We've also removed this webhook at HEAD, is your pulumi somehow pointed to HEAD? Maybe it's deleting the webhook because it doesn't see it at the latest commit?
Ah yes, the pulumi helm plugin does not support OCI registries, so I had to pull the project locally and it seems, that I have to manually checkout the proper branch.
It's working now, thanks! closing the issue.
Version
Karpenter Version: v0.27.0 Kubernetes Version: v1.23.0
Expected Behavior
Karpenter should be able to remove nodes when not needed anymore.
Actual Behavior
Unable to remove nodes, reconcile error shown periodically.
Steps to Reproduce the Problem
Install via the 'CAS migration guide'.
Resource Specs and Logs
It seems that the mutating webhook does not exist:
Community Note