Timeout error when performing step to install flux

ZillaG commented 3 years ago

I'm blindly following the walkthrough, and when I get to this step I get a timeout error. I did replace the repo URL to my forked repo, and am using my company's AWS EKS cluster that I set up, using Kubernetes v1.18.

$ helm upgrade -i flux fluxcd/flux --wait \
--namespace fluxcd \
--set git.url=git@github.com:zillag/helm-operator-get-started
Release "flux" does not exist. Installing it now.
Error: timed out waiting for the condition

What do I need to do to proceed?

ZillaG commented 3 years ago

My AWS EKS console looks

Screen Shot 2021-02-08 at 11 25 26 AM

kingdonb commented 3 years ago

@ZillaG When helm install --wait times out, usually it means that something in the helm chart failed to become ready.

In your EKS console, the screenshot that you posted is consistent with that. Can you probe any further to see what has caused them to fail becoming ready?

Some candidate reasons are: the deployment was not able to pull images from outside of a private network

In kubectl describe -n fluxcd deployment/flux you should be able to see the reason for the failure, or what is waiting. At this point you've waited long enough to rule out a failure that will hopefully resolve itself if given enough time.

ZillaG commented 3 years ago

I see the following.

$ kubectl describe -n fluxcd deployment/flux
Name:                   flux
Namespace:              fluxcd
CreationTimestamp:      Mon, 08 Feb 2021 10:27:21 -0500
Labels:                 app=flux
                        app.kubernetes.io/managed-by=Helm
                        chart=flux-1.6.1
                        heritage=Helm
                        release=flux
Annotations:            deployment.kubernetes.io/revision: 2
                        meta.helm.sh/release-name: flux
                        meta.helm.sh/release-namespace: fluxcd
Selector:               app=flux,release=flux
Replicas:               1 desired | 1 updated | 2 total | 0 available | 2 unavailable
StrategyType:           RollingUpdate
MinReadySeconds:        0
RollingUpdateStrategy:  25% max unavailable, 25% max surge
Pod Template:
  Labels:           app=flux
                    release=flux
  Service Account:  flux
  Containers:
   flux:
    Image:      docker.io/fluxcd/flux:1.21.1
    Port:       3030/TCP
    Host Port:  0/TCP
    Args:
      --log-format=fmt
      --ssh-keygen-dir=/var/fluxd/keygen
      --ssh-keygen-format=RFC4716
      --k8s-secret-name=flux-git-deploy
      --memcached-hostname=flux-memcached
      --sync-state=git
      --memcached-service=
      --git-url=git@github.com:zillag/helm-operator-get-started
      --git-branch=master
      --git-path=
      --git-readonly=false
      --git-user=Weave Flux
      --git-email=support@weave.works
      --git-verify-signatures=false
      --git-set-author=false
      --git-poll-interval=5m
      --git-timeout=20s
      --sync-interval=5m
      --git-ci-skip=false
      --automation-interval=5m
      --registry-rps=200
      --registry-burst=125
      --registry-trace=false
    Requests:
      cpu:      50m
      memory:   64Mi
    Liveness:   http-get http://:3030/api/flux/v6/identity.pub delay=5s timeout=5s period=10s #success=1 #failure=3
    Readiness:  http-get http://:3030/api/flux/v6/identity.pub delay=5s timeout=5s period=10s #success=1 #failure=3
    Environment:
      KUBECONFIG:  /root/.kubectl/config
    Mounts:
      /etc/fluxd/ssh from git-key (ro)
      /root/.kubectl from kubedir (rw)
      /var/fluxd/keygen from git-keygen (rw)
  Volumes:
   kubedir:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      flux-kube-config
    Optional:  false
   git-key:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  flux-git-deploy
    Optional:    false
   git-keygen:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:     Memory
    SizeLimit:  <unset>
Conditions:
  Type           Status  Reason
  ----           ------  ------
  Available      False   MinimumReplicasUnavailable
  Progressing    False   ProgressDeadlineExceeded
OldReplicaSets:  flux-7f6c8f87f5 (1/1 replicas created), flux-6dc79b6876 (1/1 replicas created)
NewReplicaSet:   <none>
Events:          <none>

Note I created a Fargate cluster with the following Fargate profiles

fargateProfiles:
  - name: fp-default
    selectors:
      # All workloads in the "default" Kubernetes namespace will be
      # scheduled onto Fargate:
      - namespace: default
      # All workloads in the "kube-system" Kubernetes namespace will be
      # scheduled onto Fargate:
      - namespace: kube-system
  - name: fp-sandbox
    selectors:
      # All workloads in the "sandbox" Kubernetes namespace matching the
      # following label selectors will be scheduled onto Fargate:
      - namespace: sandbox
        labels:
          env: sandbox
          checks: passed

ZillaG commented 3 years ago

I created the following Fargate profile to see if it helps

  - name: fp-fluxcd
    selectors:
      - namespace: fluxcd
        labels:
          app: flux
          release: flux

yebyen commented 3 years ago

Sorry, the deployment describe doesn't provide much useful information. Could you see if there are pods in any state with kubectl get pods -n fluxcd and run kubectl describe pod [the pod]

ZillaG commented 3 years ago

Here you go. Thanks!

$ k describe po flux-6dc79b6876-rkwl7 -n fluxcd
Name:           flux-6dc79b6876-rkwl7
Namespace:      fluxcd
Priority:       0
Node:           <none>
Labels:         app=flux
                pod-template-hash=6dc79b6876
                release=flux
Annotations:    kubernetes.io/psp: eks.privileged
Status:         Pending
IP:             
IPs:            <none>
Controlled By:  ReplicaSet/flux-6dc79b6876
Containers:
  flux:
    Image:      docker.io/fluxcd/flux:1.21.1
    Port:       3030/TCP
    Host Port:  0/TCP
    Args:
      --log-format=fmt
      --ssh-keygen-dir=/var/fluxd/keygen
      --ssh-keygen-format=RFC4716
      --k8s-secret-name=flux-git-deploy
      --memcached-hostname=flux-memcached
      --sync-state=git
      --memcached-service=
      --git-url=git@github.com:zillag/helm-operator-get-started
      --git-branch=master
      --git-path=
      --git-readonly=false
      --git-user=Weave Flux
      --git-email=support@weave.works
      --git-verify-signatures=false
      --git-set-author=false
      --git-poll-interval=5m
      --git-timeout=20s
      --sync-interval=5m
      --git-ci-skip=false
      --automation-interval=5m
      --registry-rps=200
      --registry-burst=125
      --registry-trace=false
    Requests:
      cpu:      50m
      memory:   64Mi
    Liveness:   http-get http://:3030/api/flux/v6/identity.pub delay=5s timeout=5s period=10s #success=1 #failure=3
    Readiness:  http-get http://:3030/api/flux/v6/identity.pub delay=5s timeout=5s period=10s #success=1 #failure=3
    Environment:
      KUBECONFIG:  /root/.kubectl/config
    Mounts:
      /etc/fluxd/ssh from git-key (ro)
      /root/.kubectl from kubedir (rw)
      /var/fluxd/keygen from git-keygen (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from flux-token-22jrc (ro)
Conditions:
  Type           Status
  PodScheduled   False 
Volumes:
  kubedir:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      flux-kube-config
    Optional:  false
  git-key:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  flux-git-deploy
    Optional:    false
  git-keygen:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:     Memory
    SizeLimit:  <unset>
  flux-token-22jrc:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  flux-token-22jrc
    Optional:    false
QoS Class:       Burstable
Node-Selectors:  beta.kubernetes.io/os=linux
Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:
  Type     Reason            Age                      From               Message
  ----     ------            ----                     ----               -------
  Warning  FailedScheduling  3m22s (x137 over 3h24m)  default-scheduler  0/2 nodes are available: 2 Too many pods.

yebyen commented 3 years ago

The important info is there at the bottom, failed scheduling:

Warning FailedScheduling 3m22s (x137 over 3h24m) default-scheduler 0/2 nodes are available: 2 Too many pods.

I don't know how fargate clusters work unfortunately so not sure how I can help troubleshoot this further.

I think you are on the right track, this appears to be a fargate concern of some kind, somehow you are getting a too many pods error on a fargate cluster? Check if your fargate profile is scheduling t2.micro can be increased to a larger profile, googling of this error leads me to believe the network ENIs have been exhausted and this problem will go away if a larger instance profile is used.

https://stackoverflow.com/questions/64965832/aws-eks-only-2-pod-can-be-launched-too-many-pods-error

ZillaG commented 3 years ago

I doubt that the stackoverflow link is similar to my issue. With Fargate, I shouldn't have to care the node type, which is the whole purpose of Fargate.

ZillaG commented 3 years ago

Changing the fargate profile to this made the difference.

  - name: fp-fluxcd
    selectors:
      - namespace: fluxcd

fluxcd / helm-operator-get-started

Timeout error when performing step to install flux #63