Closed mschnee closed 6 months ago
Maybe unrelated, but there's an "unsatisfiable topoloy constraint for pod anti-affinity" error affecting karpenter (attempting to scale up cert-manager/cert-manager
per docs)
{
"level": "ERROR",
"time": "2024-05-29T20:50:13.266Z",
"logger": "controller.provisioner",
"message": "Could not schedule pod, incompatible with nodepool \"burstable\", daemonset overhead={\"cpu\":\"435m\",\"memory\":\"958815207\",\"pods\":\"5\"}, unsatisfiable topology constraint for pod anti-affinity, key=node.kubernetes.io/instance-type (counts = r5d.xlarge: 1 c7a.xlarge: 1 and 3 other(s), podDomains = node.kubernetes.io/instance-type Exists, nodeDomains = node.kubernetes.io/instance-type Exists; incompatible with nodepool \"spot\", daemonset overhead={\"cpu\":\"435m\",\"memory\":\"958815207\",\"pods\":\"5\"}, unsatisfiable topology constraint for pod anti-affinity, key=node.kubernetes.io/instance-type (counts = r6idn.xlarge: 1 c6g.2xlarge: 1 and 6 other(s), podDomains = node.kubernetes.io/instance-type Exists, nodeDomains = node.kubernetes.io/instance-type Exists; incompatible with nodepool \"on-demand\", daemonset overhead={\"cpu\":\"435m\",\"memory\":\"958815207\",\"pods\":\"5\"}, unsatisfiable topology constraint for pod anti-affinity, key=node.kubernetes.io/instance-type (counts = c6gn.medium: 1 c3.2xlarge: 1 and 4 other(s), podDomains = node.kubernetes.io/instance-type Exists, nodeDomains = node.kubernetes.io/instance-type Exists",
"commit": "8b2d1d7",
"pod": "core-dns/core-dns-576668f9c9-ps75b"
}
{
"level": "ERROR",
"time": "2024-05-29T20:50:13.273Z",
"logger": "controller.provisioner",
"message": "creating node claim, NodeClaim.karpenter.sh \"burstable-lqnfn\" is invalid: [spec.requirements: Too many: 33: must have at most 30 items, <nil>: Invalid value: \"null\": some validation rules were not checked because the object was invalid; correct the existing errors to complete validation]",
"commit": "8b2d1d7"
}
Fixed in 2d2dd57c3445cf234a9519317eefcea011ff74bb
Prior Search
What happened?
In the latest edge release edge.24-05-23, antiAffinity rules were added to "ensure that pods in the same deployment are not scheduled on the same instance type (not just the same instance) in order to prevent disruption caused by spot instance scale-in."
This unfortunately invalidates the bootstrapping guide as many services do not successfully apply as the desired number of pods cannot be scheduled. The list so far:
I would like to recommend that this instead be configuration that can be changed, potentially at the region level, so that ideal topology and affinity rules can be set once the cluster is bootstrapped.
Steps to Reproduce
Follow the bootstrapping guide on a net-new VPC & Cluster.
Version
main (development branch)
Relevant log output
The affinity in question: https://github.com/Panfactum/stack/compare/edge.24-05-15...edge.24-05-23#diff-399e98d14072c9446c6e0ad873ab113356418a363f13830321a05be500ebdbbcR95
And example of it's usage: https://github.com/Panfactum/stack/compare/edge.24-05-15...edge.24-05-23#diff-1c228ec7df95544f30340762e30922fdc8b0ac227e3f13c1ff1e01b29905ac61R343