Closed hitsub2 closed 10 months ago
After changing amiFamily from AL2 to Custom, it seems that there is no any noready nodes. So my question: what is the behavior when providing kubelet config via user data? Does the user data will be executed twice which caused this bug?
This seems like a duplicate of Node repair. Since most of the nodes (398/400) became ready, it seems like a transit error was the problem in this case.
It is the responsibility for karpenter to do the node repair, but I just wondering why this happens, does it due to the two time running of user-data?
I suspect its not due to userData, as most of the nodes are ready
Closing as a duplicate of https://github.com/aws/karpenter-core/issues/750
@hitsub2 Just wondering, where you following a guide or something else for working with these flags?
--enforce-node-allocatable=pods,kube-reserved,system-reserved --system-reserved-cgroup=/system.slice --kube-reserved-cgroup=/system.slice
Description
Observed Behavior: When provided the following kubelet args, some nodes(2 out of 400) are not ready and karpenter can not disrupt them, leaving them forever.
Extra kubelet config:
--cpu-manager-policy=static --enforce-node-allocatable=pods,kube-reserved,system-reserved --system-reserved-cgroup=/system.slice --kube-reserved-cgroup=/system.slice
ec2 nodelcass.yaml
kublet error log
Expected Behavior: All the nodes should be ready, if notready nodes come up, karpenter should recycle them or disrup them. Reproduction Steps (Please include YAML):
Versions:
Karpenter Version: v0.32.1
Kubernetes Version (
kubectl version
): 1.25Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
If you are interested in working on this issue or have submitted a pull request, please leave a comment