Closed mschnee closed 3 months ago
I am also running into this right now.
I've tried setting bootstrapping_mode_enabled
to false
and terragrunt apply
didn't detect a change. As I'm going through the guide as a new user setting up, I'm blocked here.
Also, just adding that my cluster only has 2 nodes vs 3.
I don't know how taints work but maybe the tolerations created by connectivity test not matching the node taint requirements?
Taints: arm64=true:NoSchedule
│ burstable=true:NoSchedule
│ spot=true:NoSchedule
In my case I think the issue is that I was using bootstrapping_mode_enabled
vs bootstrap_mode_enabled
. Will make pr to fix the guide https://panfactum.com/docs/edge/guides/bootstrapping/kubernetes-cluster#enable-bootstrapping-mode
I decided that instead of forging ahead and assuming the networking would work, I should scrap the cluster and restart. Now, the tests run, however I have test failures (though this may be besboke-VPC related)
It may also be worth updating the documentation with something along the lines of "if the pods don't schedule, something has gone terribly wrong and you should start again". So, no longer a bug.
@mschnee that is suprising because I believe you're suspicion around the taint is correct. When I modify the cilium client to include the toleration for arm64
the pod did start up.
Given my test results here, I wonder how the cilium clients are being scheduled for you? Can you share the taints from the node and the tolerations from your cilium client?
I also tried manually removing the taints from the node and running the test. The test pods are now running and I'm awaiting results.
Reporting back that tests are successful after
bootstrap_mode_enabled
arm64
taint from the nodesIn addition to the typo for bootstrap_mode_enabled
, the core issue is that the last release changes the EKS cluster to run arm64
nodes (as they are cheaper). As arm64 compatibility cannot be guaranteed by all utilities, we add the arm64 taint.
All the Panfactum IaC modules have the appropriate arm64 tolerations. However, the manifests deployed by cilium connectivity test
do not. Additionally, since these run before karpenter is deployed, no amd64 nodes are provisioned.
As a result, we will revert the change so that EKS uses amd64
nodes when bootstrap_mode_enabled
is true
and arm64
otherwise. That should resolve this issue.
Resolved.
Prior Search
What happened?
Several modules fail to deploy and schedule pods due to an arm64 taint. Most noticeably, earliest in the bootstrapping documentation is that the
cilium connectivity test
command fails to start because its pods do not schedule.Other modules that fail include:
We've been able to work around this by manually removing the taint from one of the on-demand nodes, but karpenter-provisioned nodes also start up with this taint.
Steps to Reproduce
Start a net new cluster with
bootstrap_mode_enabled = true
. The three nodes in the cluster arebeta.kubernetes.io/arch=arm64
and the cilium tests cannot be run on them. The pods are unschedulable.Version
main (development branch)
Relevant log output