Closed ZillaG closed 3 years ago
My AWS EKS console looks
@ZillaG When helm install --wait
times out, usually it means that something in the helm chart failed to become ready.
In your EKS console, the screenshot that you posted is consistent with that. Can you probe any further to see what has caused them to fail becoming ready?
Some candidate reasons are: the deployment was not able to pull images from outside of a private network
In kubectl describe -n fluxcd deployment/flux
you should be able to see the reason for the failure, or what is waiting. At this point you've waited long enough to rule out a failure that will hopefully resolve itself if given enough time.
I see the following.
$ kubectl describe -n fluxcd deployment/flux
Name: flux
Namespace: fluxcd
CreationTimestamp: Mon, 08 Feb 2021 10:27:21 -0500
Labels: app=flux
app.kubernetes.io/managed-by=Helm
chart=flux-1.6.1
heritage=Helm
release=flux
Annotations: deployment.kubernetes.io/revision: 2
meta.helm.sh/release-name: flux
meta.helm.sh/release-namespace: fluxcd
Selector: app=flux,release=flux
Replicas: 1 desired | 1 updated | 2 total | 0 available | 2 unavailable
StrategyType: RollingUpdate
MinReadySeconds: 0
RollingUpdateStrategy: 25% max unavailable, 25% max surge
Pod Template:
Labels: app=flux
release=flux
Service Account: flux
Containers:
flux:
Image: docker.io/fluxcd/flux:1.21.1
Port: 3030/TCP
Host Port: 0/TCP
Args:
--log-format=fmt
--ssh-keygen-dir=/var/fluxd/keygen
--ssh-keygen-format=RFC4716
--k8s-secret-name=flux-git-deploy
--memcached-hostname=flux-memcached
--sync-state=git
--memcached-service=
--git-url=git@github.com:zillag/helm-operator-get-started
--git-branch=master
--git-path=
--git-readonly=false
--git-user=Weave Flux
--git-email=support@weave.works
--git-verify-signatures=false
--git-set-author=false
--git-poll-interval=5m
--git-timeout=20s
--sync-interval=5m
--git-ci-skip=false
--automation-interval=5m
--registry-rps=200
--registry-burst=125
--registry-trace=false
Requests:
cpu: 50m
memory: 64Mi
Liveness: http-get http://:3030/api/flux/v6/identity.pub delay=5s timeout=5s period=10s #success=1 #failure=3
Readiness: http-get http://:3030/api/flux/v6/identity.pub delay=5s timeout=5s period=10s #success=1 #failure=3
Environment:
KUBECONFIG: /root/.kubectl/config
Mounts:
/etc/fluxd/ssh from git-key (ro)
/root/.kubectl from kubedir (rw)
/var/fluxd/keygen from git-keygen (rw)
Volumes:
kubedir:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: flux-kube-config
Optional: false
git-key:
Type: Secret (a volume populated by a Secret)
SecretName: flux-git-deploy
Optional: false
git-keygen:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium: Memory
SizeLimit: <unset>
Conditions:
Type Status Reason
---- ------ ------
Available False MinimumReplicasUnavailable
Progressing False ProgressDeadlineExceeded
OldReplicaSets: flux-7f6c8f87f5 (1/1 replicas created), flux-6dc79b6876 (1/1 replicas created)
NewReplicaSet: <none>
Events: <none>
Note I created a Fargate cluster with the following Fargate profiles
fargateProfiles:
- name: fp-default
selectors:
# All workloads in the "default" Kubernetes namespace will be
# scheduled onto Fargate:
- namespace: default
# All workloads in the "kube-system" Kubernetes namespace will be
# scheduled onto Fargate:
- namespace: kube-system
- name: fp-sandbox
selectors:
# All workloads in the "sandbox" Kubernetes namespace matching the
# following label selectors will be scheduled onto Fargate:
- namespace: sandbox
labels:
env: sandbox
checks: passed
I created the following Fargate profile to see if it helps
- name: fp-fluxcd
selectors:
- namespace: fluxcd
labels:
app: flux
release: flux
Sorry, the deployment describe doesn't provide much useful information. Could you see if there are pods in any state with kubectl get pods -n fluxcd
and run kubectl describe pod [the pod]
Here you go. Thanks!
$ k describe po flux-6dc79b6876-rkwl7 -n fluxcd
Name: flux-6dc79b6876-rkwl7
Namespace: fluxcd
Priority: 0
Node: <none>
Labels: app=flux
pod-template-hash=6dc79b6876
release=flux
Annotations: kubernetes.io/psp: eks.privileged
Status: Pending
IP:
IPs: <none>
Controlled By: ReplicaSet/flux-6dc79b6876
Containers:
flux:
Image: docker.io/fluxcd/flux:1.21.1
Port: 3030/TCP
Host Port: 0/TCP
Args:
--log-format=fmt
--ssh-keygen-dir=/var/fluxd/keygen
--ssh-keygen-format=RFC4716
--k8s-secret-name=flux-git-deploy
--memcached-hostname=flux-memcached
--sync-state=git
--memcached-service=
--git-url=git@github.com:zillag/helm-operator-get-started
--git-branch=master
--git-path=
--git-readonly=false
--git-user=Weave Flux
--git-email=support@weave.works
--git-verify-signatures=false
--git-set-author=false
--git-poll-interval=5m
--git-timeout=20s
--sync-interval=5m
--git-ci-skip=false
--automation-interval=5m
--registry-rps=200
--registry-burst=125
--registry-trace=false
Requests:
cpu: 50m
memory: 64Mi
Liveness: http-get http://:3030/api/flux/v6/identity.pub delay=5s timeout=5s period=10s #success=1 #failure=3
Readiness: http-get http://:3030/api/flux/v6/identity.pub delay=5s timeout=5s period=10s #success=1 #failure=3
Environment:
KUBECONFIG: /root/.kubectl/config
Mounts:
/etc/fluxd/ssh from git-key (ro)
/root/.kubectl from kubedir (rw)
/var/fluxd/keygen from git-keygen (rw)
/var/run/secrets/kubernetes.io/serviceaccount from flux-token-22jrc (ro)
Conditions:
Type Status
PodScheduled False
Volumes:
kubedir:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: flux-kube-config
Optional: false
git-key:
Type: Secret (a volume populated by a Secret)
SecretName: flux-git-deploy
Optional: false
git-keygen:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium: Memory
SizeLimit: <unset>
flux-token-22jrc:
Type: Secret (a volume populated by a Secret)
SecretName: flux-token-22jrc
Optional: false
QoS Class: Burstable
Node-Selectors: beta.kubernetes.io/os=linux
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 3m22s (x137 over 3h24m) default-scheduler 0/2 nodes are available: 2 Too many pods.
The important info is there at the bottom, failed scheduling:
Warning FailedScheduling 3m22s (x137 over 3h24m) default-scheduler 0/2 nodes are available: 2 Too many pods.
I don't know how fargate clusters work unfortunately so not sure how I can help troubleshoot this further.
I think you are on the right track, this appears to be a fargate concern of some kind, somehow you are getting a too many pods error on a fargate cluster? Check if your fargate profile is scheduling t2.micro can be increased to a larger profile, googling of this error leads me to believe the network ENIs have been exhausted and this problem will go away if a larger instance profile is used.
https://stackoverflow.com/questions/64965832/aws-eks-only-2-pod-can-be-launched-too-many-pods-error
I doubt that the stackoverflow link is similar to my issue. With Fargate, I shouldn't have to care the node type, which is the whole purpose of Fargate.
Changing the fargate profile to this made the difference.
- name: fp-fluxcd
selectors:
- namespace: fluxcd
I'm blindly following the walkthrough, and when I get to this step I get a timeout error. I did replace the repo URL to my forked repo, and am using my company's AWS EKS cluster that I set up, using Kubernetes v1.18.
What do I need to do to proceed?