Closed gemagomez closed 5 years ago
I'd say this should be an issue in the profile repo.
Yep, I agree with @errordeveloper
FWIW, it is a matter of having the right IAM policies in place when creating the cluster, namely:
nodeGroups:
- name: ng-1
instanceType: m5.large
- desiredCapacity: 1
+ minSize: 1
+ maxSize: 2
+ iam:
+ withAddonPolicies:
+ albIngress: true
+ autoScaler: true
+ cloudWatch: true
cloudWatch:
clusterLogging:
How do we want to proceed there?
We should add this to the quickstart guide for now. We'll fix it as part of a different issue later on.
I'll document this in the quickstart profile's repository, but I thought I'd also provide an example ClusterConfig
manifest which users can use out-of-the-box. See also: #1249.
@marccarre should this issue be closed actually?
Not yet, only once we've merged https://github.com/weaveworks/eks-quickstart-app-dev/pull/22
happened to me today.. when applying profile app-dev when creating a new cluster from gitops... how do I fix?
@ilanpillemer, did you have the required IAM roles in place in your cluster?
cluster-autoscaler
will CrashLoopBackOff
. See also these steps which reproduce the issue. (Which I have also run, to double check things & ensure I can actually reproduce the issue.)
Yes. I have resolved the issue. If you follow the instructions word for word it fails. You need to add the roles with the necessary config when creating the cluster.
On Tue, 10 Dec 2019, 10:33 Marc Carré, notifications@github.com wrote:
@ilanpillemer https://github.com/ilanpillemer, did you have the required IAM roles https://github.com/weaveworks/eks-quickstart-app-dev/#pre-requisites in place in your cluster? If you have them, then this should work fine. See also the steps in this collapsible. (I just re-ran this myself to be sure it still does work as expected. It does.)
$ git diff diff --git a/examples/eks-quickstart-app-dev.yaml b/examples/eks-quickstart-app-dev.yaml index 487cb46b..5783c605 100644 --- a/examples/eks-quickstart-app-dev.yaml +++ b/examples/eks-quickstart-app-dev.yaml @@ -5,8 +5,8 @@ apiVersion: eksctl.io/v1alpha5 kind: ClusterConfig
metadata:
- name: cluster-12
- region: eu-north-1
- name: mc-1237-testing-with-iam
region: ap-northeast-1
nodeGroups:
- name: ng-1
$ eksctl create cluster -f examples/eks-quickstart-app-dev.yaml [ℹ] eksctl version 0.11.1 [ℹ] using region ap-northeast-1 [ℹ] setting availability zones to [ap-northeast-1c ap-northeast-1d ap-northeast-1a] [ℹ] subnets for ap-northeast-1c - public:192.168.0.0/19 private:192.168.96.0/19 [ℹ] subnets for ap-northeast-1d - public:192.168.32.0/19 private:192.168.128.0/19 [ℹ] subnets for ap-northeast-1a - public:192.168.64.0/19 private:192.168.160.0/19 [ℹ] nodegroup "ng-1" will use "ami-02e124a380df41614" [AmazonLinux2/1.14] [ℹ] using Kubernetes version 1.14 [ℹ] creating EKS cluster "mc-1237-testing-with-iam" in "ap-northeast-1" region with un-managed nodes [ℹ] 1 nodegroup (ng-1) was included (based on the include/exclude rules) [ℹ] will create a CloudFormation stack for cluster itself and 1 nodegroup stack(s) [ℹ] will create a CloudFormation stack for cluster itself and 0 managed nodegroup stack(s) [ℹ] if you encounter any issues, check CloudFormation console or try 'eksctl utils describe-stacks --region=ap-northeast-1 --cluster=mc-1237-testing-with-iam' [ℹ] CloudWatch logging will not be enabled for cluster "mc-1237-testing-with-iam" in "ap-northeast-1" [ℹ] you can enable it with 'eksctl utils update-cluster-logging --region=ap-northeast-1 --cluster=mc-1237-testing-with-iam' [ℹ] Kubernetes API endpoint access will use default of {publicAccess=true, privateAccess=false} for cluster "mc-1237-testing-with-iam" in "ap-northeast-1" [ℹ] 2 sequential tasks: { create cluster control plane "mc-1237-testing-with-iam", create nodegroup "ng-1" } [ℹ] building cluster stack "eksctl-mc-1237-testing-with-iam-cluster" [ℹ] deploying stack "eksctl-mc-1237-testing-with-iam-cluster" [ℹ] building nodegroup stack "eksctl-mc-1237-testing-with-iam-nodegroup-ng-1" [ℹ] deploying stack "eksctl-mc-1237-testing-with-iam-nodegroup-ng-1" [✔] all EKS cluster resources for "mc-1237-testing-with-iam" have been created [✔] saved kubeconfig as "${HOME}/.kube/config" [ℹ] adding identity "arn:aws:iam::083751696308:role/eksctl-mc-1237-testing-with-iam-n-NodeInstanceRole-1M7OF6KB2D8RV" to auth ConfigMap [ℹ] nodegroup "ng-1" has 0 node(s) [ℹ] waiting for at least 1 node(s) to become ready in "ng-1" [ℹ] nodegroup "ng-1" has 1 node(s) [ℹ] node "ip-192-168-13-77.ap-northeast-1.compute.internal" is ready [ℹ] kubectl command should work with "${HOME}/.kube/config", try 'kubectl get nodes' [✔] EKS cluster "mc-1237-testing-with-iam" in "ap-northeast-1" region is ready
$ EKSCTL_EXPERIMENTAL=true eksctl enable repo \
-f examples/eks-quickstart-app-dev.yaml \ --git-email carre.marc+flux@gmail.com \ --git-url git@github.com:marccarre/my-gitops-repo.git
[ℹ] Generating public key infrastructure for the Helm Operator and Tiller [ℹ] this may take up to a minute, please be patient [!] Public key infrastructure files were written into directory "/var/folders/24/d3mml6bn20nftpt91cfldq1h0000gn/T/eksctl-helm-pki431635447" [!] please move the files into a safe place or delete them [ℹ] Generating manifests [ℹ] Cloning git@github.com:marccarre/my-gitops-repo.git Cloning into '/var/folders/24/d3mml6bn20nftpt91cfldq1h0000gn/T/eksctl-install-flux-clone-956113642'... remote: Enumerating objects: 59, done. remote: Counting objects: 100% (59/59), done. remote: Compressing objects: 100% (55/55), done. remote: Total 447 (delta 11), reused 50 (delta 3), pack-reused 388 Receiving objects: 100% (447/447), 183.32 KiB | 514.00 KiB/s, done. Resolving deltas: 100% (157/157), done. Already on 'master' Your branch is up to date with 'origin/master'. [ℹ] Writing Flux manifests [ℹ] created "Namespace/flux" [ℹ] Applying Helm TLS Secret(s) [ℹ] created "flux:Secret/flux-helm-tls-cert" [ℹ] created "flux:Secret/tiller-secret" [!] Note: certificate secrets aren't added to the Git repository for security reasons [ℹ] Applying manifests [ℹ] created "flux:Deployment.apps/flux" [ℹ] created "flux:ServiceAccount/flux-helm-operator" [ℹ] created "ClusterRole.rbac.authorization.k8s.io/flux-helm-operator" [ℹ] created "ClusterRoleBinding.rbac.authorization.k8s.io/flux-helm-operator" [ℹ] created "CustomResourceDefinition.apiextensions.k8s.io/helmreleases.helm.fluxcd.io" [ℹ] created "flux:Secret/flux-git-deploy" [ℹ] created "flux:Deployment.apps/memcached" [ℹ] created "flux:Deployment.apps/flux-helm-operator" [ℹ] created "flux:Deployment.extensions/tiller-deploy" [ℹ] created "flux:Service/tiller-deploy" [ℹ] created "flux:Service/memcached" [ℹ] created "flux:ServiceAccount/flux" [ℹ] created "ClusterRole.rbac.authorization.k8s.io/flux" [ℹ] created "ClusterRoleBinding.rbac.authorization.k8s.io/flux" [ℹ] created "flux:ConfigMap/flux-helm-tls-ca-config" [ℹ] created "flux:ServiceAccount/tiller" [ℹ] created "ClusterRoleBinding.rbac.authorization.k8s.io/tiller" [ℹ] created "flux:ServiceAccount/helm" [ℹ] created "flux:Role.rbac.authorization.k8s.io/tiller-user" [ℹ] created "kube-system:RoleBinding.rbac.authorization.k8s.io/tiller-user-binding" [ℹ] Waiting for Helm Operator to start ERROR: logging before flag.Parse: E1210 18:44:24.787197 4822 portforward.go:331] an error occurred forwarding 50846 -> 3030: error forwarding port 3030 to pod 76da29d57382ad29d0d4b67fe633dec4222c084530a43eb7c7f1719ba50b10a0, uid : exit status 1: 2019/12/10 09:44:24 socat[6735] E connect(5, AF=2 127.0.0.1:3030, 16): Connection refused [!] Helm Operator is not ready yet (Get http://127.0.0.1:50846/healthz: EOF), retrying ... ERROR: logging before flag.Parse: E1210 18:44:26.816135 4822 portforward.go:331] an error occurred forwarding 50846 -> 3030: error forwarding port 3030 to pod 76da29d57382ad29d0d4b67fe633dec4222c084530a43eb7c7f1719ba50b10a0, uid : exit status 1: 2019/12/10 09:44:26 socat[6814] E connect(5, AF=2 127.0.0.1:3030, 16): Connection refused [!] Helm Operator is not ready yet (Get http://127.0.0.1:50846/healthz: EOF), retrying ... ERROR: logging before flag.Parse: E1210 18:44:28.844545 4822 portforward.go:331] an error occurred forwarding 50846 -> 3030: error forwarding port 3030 to pod 76da29d57382ad29d0d4b67fe633dec4222c084530a43eb7c7f1719ba50b10a0, uid : exit status 1: 2019/12/10 09:44:28 socat[6870] E connect(5, AF=2 127.0.0.1:3030, 16): Connection refused [!] Helm Operator is not ready yet (Get http://127.0.0.1:50846/healthz: EOF), retrying ... ERROR: logging before flag.Parse: E1210 18:44:30.877698 4822 portforward.go:331] an error occurred forwarding 50846 -> 3030: error forwarding port 3030 to pod 76da29d57382ad29d0d4b67fe633dec4222c084530a43eb7c7f1719ba50b10a0, uid : exit status 1: 2019/12/10 09:44:30 socat[6967] E connect(5, AF=2 127.0.0.1:3030, 16): Connection refused [!] Helm Operator is not ready yet (Get http://127.0.0.1:50846/healthz: EOF), retrying ... ERROR: logging before flag.Parse: E1210 18:44:32.914902 4822 portforward.go:331] an error occurred forwarding 50846 -> 3030: error forwarding port 3030 to pod 76da29d57382ad29d0d4b67fe633dec4222c084530a43eb7c7f1719ba50b10a0, uid : exit status 1: 2019/12/10 09:44:32 socat[7082] E connect(5, AF=2 127.0.0.1:3030, 16): Connection refused [!] Helm Operator is not ready yet (Get http://127.0.0.1:50846/healthz: EOF), retrying ... ERROR: logging before flag.Parse: E1210 18:44:34.944906 4822 portforward.go:331] an error occurred forwarding 50846 -> 3030: error forwarding port 3030 to pod 76da29d57382ad29d0d4b67fe633dec4222c084530a43eb7c7f1719ba50b10a0, uid : exit status 1: 2019/12/10 09:44:34 socat[7084] E connect(5, AF=2 127.0.0.1:3030, 16): Connection refused [!] Helm Operator is not ready yet (Get http://127.0.0.1:50846/healthz: EOF), retrying ... ERROR: logging before flag.Parse: E1210 18:44:36.971253 4822 portforward.go:331] an error occurred forwarding 50846 -> 3030: error forwarding port 3030 to pod 76da29d57382ad29d0d4b67fe633dec4222c084530a43eb7c7f1719ba50b10a0, uid : exit status 1: 2019/12/10 09:44:36 socat[7085] E connect(5, AF=2 127.0.0.1:3030, 16): Connection refused [!] Helm Operator is not ready yet (Get http://127.0.0.1:50846/healthz: EOF), retrying ... ERROR: logging before flag.Parse: E1210 18:44:38.998610 4822 portforward.go:331] an error occurred forwarding 50846 -> 3030: error forwarding port 3030 to pod 76da29d57382ad29d0d4b67fe633dec4222c084530a43eb7c7f1719ba50b10a0, uid : exit status 1: 2019/12/10 09:44:39 socat[7090] E connect(5, AF=2 127.0.0.1:3030, 16): Connection refused [!] Helm Operator is not ready yet (Get http://127.0.0.1:50846/healthz: EOF), retrying ... ERROR: logging before flag.Parse: E1210 18:44:41.023201 4822 portforward.go:331] an error occurred forwarding 50846 -> 3030: error forwarding port 3030 to pod 76da29d57382ad29d0d4b67fe633dec4222c084530a43eb7c7f1719ba50b10a0, uid : exit status 1: 2019/12/10 09:44:41 socat[7093] E connect(5, AF=2 127.0.0.1:3030, 16): Connection refused [!] Helm Operator is not ready yet (Get http://127.0.0.1:50846/healthz: EOF), retrying ... ERROR: logging before flag.Parse: E1210 18:44:43.053384 4822 portforward.go:331] an error occurred forwarding 50846 -> 3030: error forwarding port 3030 to pod 76da29d57382ad29d0d4b67fe633dec4222c084530a43eb7c7f1719ba50b10a0, uid : exit status 1: 2019/12/10 09:44:43 socat[7113] E connect(5, AF=2 127.0.0.1:3030, 16): Connection refused [!] Helm Operator is not ready yet (Get http://127.0.0.1:50846/healthz: EOF), retrying ... ERROR: logging before flag.Parse: E1210 18:44:45.084005 4822 portforward.go:331] an error occurred forwarding 50846 -> 3030: error forwarding port 3030 to pod 76da29d57382ad29d0d4b67fe633dec4222c084530a43eb7c7f1719ba50b10a0, uid : exit status 1: 2019/12/10 09:44:45 socat[7115] E connect(5, AF=2 127.0.0.1:3030, 16): Connection refused [!] Helm Operator is not ready yet (Get http://127.0.0.1:50846/healthz: EOF), retrying ... ERROR: logging before flag.Parse: E1210 18:44:47.115951 4822 portforward.go:331] an error occurred forwarding 50846 -> 3030: error forwarding port 3030 to pod 76da29d57382ad29d0d4b67fe633dec4222c084530a43eb7c7f1719ba50b10a0, uid : exit status 1: 2019/12/10 09:44:47 socat[7116] E connect(5, AF=2 127.0.0.1:3030, 16): Connection refused [!] Helm Operator is not ready yet (Get http://127.0.0.1:50846/healthz: EOF), retrying ... [ℹ] Helm Operator started successfully [ℹ] see https://docs.fluxcd.io/projects/helm-operator for details on how to use the Helm Operator [ℹ] Waiting for Flux to start [ℹ] Flux started successfully [ℹ] see https://docs.fluxcd.io/projects/flux for details on how to use Flux [ℹ] Committing and pushing manifests to git@github.com:marccarre/my-gitops-repo.git [master 15b0aad] Add Initial Flux configuration 13 files changed, 803 insertions(+) create mode 100644 flux/flux-account.yaml create mode 100644 flux/flux-deployment.yaml create mode 100644 flux/flux-helm-operator-account.yaml create mode 100644 flux/flux-helm-release-crd.yaml create mode 100644 flux/flux-namespace.yaml create mode 100644 flux/flux-secret.yaml create mode 100644 flux/helm-operator-deployment.yaml create mode 100644 flux/memcache-dep.yaml create mode 100644 flux/memcache-svc.yaml create mode 100644 flux/tiller-ca-cert-configmap.yaml create mode 100644 flux/tiller-dep.yaml create mode 100644 flux/tiller-rbac.yaml create mode 100644 flux/tiller-svc.yaml Enumerating objects: 17, done. Counting objects: 100% (17/17), done. Delta compression using up to 8 threads Compressing objects: 100% (15/15), done. Writing objects: 100% (16/16), 9.33 KiB | 9.33 MiB/s, done. Total 16 (delta 1), reused 12 (delta 1) remote: Resolving deltas: 100% (1/1), done. To github.com:marccarre/my-gitops-repo.git e54ab6f..15b0aad master -> master [ℹ] Flux will only operate properly once it has write-access to the Git repository [ℹ] please configure git@github.com:marccarre/my-gitops-repo.git so that the following Flux SSH public key has write access to it ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDFgi4LH0m5lCSUf/qmBTTZIz3MASZOQMepyDUYxtmAycwC0158op7ykTvHgmAqfXMxS90LzDQ4qPUxWKgExfjnWv3u7gWJBhDJhhDyLEodJLO6/IljgC1rUPTj5QJ1AwcPM7cvoB5sIBVq1iU6Jmf0Hp/BL2QEiLdiBdpA4HkPGKOMvzB+nNiLg4iJbCdAKAefHJWqWvf2k+PPTkVgpQ9ujcyQ+KHczY8Aj4HPu9he8C8S9Sqj2Vxq/qKZVbAuxllINy/WXlCB9SdbPx1b66g9Hiw6meoXiYJPaLft78SVXLQBx7l1anDabmcRnNHSChwMY8AAVFBssm537DyAHuG5
Then added the above SSH key to https://github.com/marccarre/my-gitops-repo/deploy_keys
$ kubectl get po --all-namespaces NAMESPACE NAME READY STATUS RESTARTS AGE flux flux-7696dbc4cd-sjbv7 1/1 Running 0 17m flux flux-helm-operator-8687676b89-qw7kq 1/1 Running 0 17m flux memcached-5dcd7579-7bn6l 1/1 Running 0 17m flux tiller-deploy-69547b56b4-p6zxd 1/1 Running 0 17m kube-system aws-node-f8g7z 1/1 Running 0 20m kube-system coredns-699bb99bf8-gptx4 1/1 Running 0 27m kube-system coredns-699bb99bf8-smzch 1/1 Running 0 27m kube-system kube-proxy-28xqt 1/1 Running 0 20m
$ EKSCTL_EXPERIMENTAL=true eksctl enable profile app-dev \
-f examples/eks-quickstart-app-dev.yaml \ --git-email carre.marc+flux@gmail.com \ --git-url git@github.com:marccarre/my-gitops-repo.git
Cloning into '/var/folders/24/d3mml6bn20nftpt91cfldq1h0000gn/T/my-gitops-repo-547778386'... remote: Enumerating objects: 63, done. remote: Counting objects: 100% (63/63), done. remote: Compressing objects: 100% (59/59), done. remote: Total 451 (delta 12), reused 53 (delta 3), pack-reused 388 Receiving objects: 100% (451/451), 185.04 KiB | 104.00 KiB/s, done. Resolving deltas: 100% (158/158), done. Already on 'master' Your branch is up to date with 'origin/master'. [ℹ] cloning repository "https://github.com/weaveworks/eks-quickstart-app-dev":master Cloning into '/var/folders/24/d3mml6bn20nftpt91cfldq1h0000gn/T/quickstart-008692361'... remote: Enumerating objects: 5, done. remote: Counting objects: 100% (5/5), done. remote: Compressing objects: 100% (4/4), done. remote: Total 214 (delta 0), reused 0 (delta 0), pack-reused 209 Receiving objects: 100% (214/214), 57.27 KiB | 335.00 KiB/s, done. Resolving deltas: 100% (92/92), done. Already on 'master' Your branch is up to date with 'origin/master'. [ℹ] processing template files in repository [ℹ] writing new manifests to "/var/folders/24/d3mml6bn20nftpt91cfldq1h0000gn/T/my-gitops-repo-547778386/base" [master b7070d5] Add app-dev quickstart components 27 files changed, 1380 insertions(+) create mode 100644 base/LICENSE create mode 100644 base/README.md create mode 100644 base/amazon-cloudwatch/cloudwatch-agent-configmap.yaml create mode 100644 base/amazon-cloudwatch/cloudwatch-agent-daemonset.yaml create mode 100644 base/amazon-cloudwatch/cloudwatch-agent-rbac.yaml create mode 100644 base/amazon-cloudwatch/fluentd-configmap-cluster-info.yaml create mode 100644 base/amazon-cloudwatch/fluentd-configmap-fluentd-config.yaml create mode 100644 base/amazon-cloudwatch/fluentd-daemonset.yaml create mode 100644 base/amazon-cloudwatch/fluentd-rbac.yaml create mode 100644 base/demo/helm-release.yaml create mode 100644 base/kube-system/alb-ingress-controller-deployment.yaml create mode 100644 base/kube-system/alb-ingress-controller-rbac.yaml create mode 100644 base/kube-system/cluster-autoscaler-deployment.yaml create mode 100644 base/kube-system/cluster-autoscaler-rbac.yaml create mode 100644 base/kubernetes-dashboard/dashboard-metrics-scraper-deployment.yaml create mode 100644 base/kubernetes-dashboard/dashboard-metrics-scraper-service.yaml create mode 100644 base/kubernetes-dashboard/kubernetes-dashboard-configmap.yaml create mode 100644 base/kubernetes-dashboard/kubernetes-dashboard-deployment.yaml create mode 100644 base/kubernetes-dashboard/kubernetes-dashboard-rbac.yaml create mode 100644 base/kubernetes-dashboard/kubernetes-dashboard-secrets.yaml create mode 100644 base/kubernetes-dashboard/kubernetes-dashboard-service.yaml create mode 100644 base/monitoring/metrics-server.yaml create mode 100644 base/monitoring/prometheus-operator.yaml create mode 100644 base/namespaces/amazon-cloudwatch.yaml create mode 100644 base/namespaces/demo.yaml create mode 100644 base/namespaces/kubernetes-dashboard.yaml create mode 100644 base/namespaces/monitoring.yaml Enumerating objects: 37, done. Counting objects: 100% (37/37), done. Delta compression using up to 8 threads Compressing objects: 100% (28/28), done. Writing objects: 100% (36/36), 13.54 KiB | 13.54 MiB/s, done. Total 36 (delta 7), reused 27 (delta 7) remote: Resolving deltas: 100% (7/7), done. To github.com:marccarre/my-gitops-repo.git 15b0aad..b7070d5 master -> master
$ kubectl get po --all-namespaces NAMESPACE NAME READY STATUS RESTARTS AGE amazon-cloudwatch cloudwatch-agent-h9wr7 1/1 Running 0 15m amazon-cloudwatch fluentd-cloudwatch-8r5f6 1/1 Running 0 15m demo podinfo-67b7886b6c-bvdtm 1/1 Running 0 15m flux flux-7696dbc4cd-sjbv7 1/1 Running 0 36m flux flux-helm-operator-8687676b89-qw7kq 1/1 Running 0 36m flux memcached-5dcd7579-7bn6l 1/1 Running 0 36m flux tiller-deploy-69547b56b4-p6zxd 1/1 Running 0 36m kube-system alb-ingress-controller-8df75bc98-gssb9 1/1 Running 0 15m kube-system aws-node-f8g7z 1/1 Running 0 39m kube-system cluster-autoscaler-86d68b66cb-b9xqv 1/1 Running 0 15m kube-system coredns-699bb99bf8-gptx4 1/1 Running 0 46m kube-system coredns-699bb99bf8-smzch 1/1 Running 0 46m kube-system kube-proxy-28xqt 1/1 Running 0 39m kubernetes-dashboard dashboard-metrics-scraper-65785bfbc-s8tq6 1/1 Running 0 15m kubernetes-dashboard kubernetes-dashboard-76b969b44b-rwgk5 1/1 Running 0 15m monitoring alertmanager-prometheus-operator-alertmanager-0 2/2 Running 0 14m monitoring metrics-server-5df4599bd7-cgh79 1/1 Running 0 15m monitoring prometheus-operator-grafana-dd95fb7d4-n9ddh 2/2 Running 0 15m monitoring prometheus-operator-kube-state-metrics-5d7558d7cc-h8xgg 1/1 Running 0 15m monitoring prometheus-operator-operator-67895dd7c5-nqj7w 1/1 Running 0 15m monitoring prometheus-operator-prometheus-node-exporter-qp8gp 1/1 Running 0 15m monitoring prometheus-prometheus-operator-prometheus-0 3/3 Running 1 14m
If, however, you do NOT have the IAM roles in place, then the cluster-autoscaler will CrashLoopBackOff. See also these steps which reproduce the issue. (Which I have also run, to double check things & ensure I can actually reproduce the issue.)
$ eksctl create cluster --name mc-1237-testing [ℹ] eksctl version 0.11.1 [ℹ] using region ap-northeast-1 [ℹ] setting availability zones to [ap-northeast-1d ap-northeast-1a ap-northeast-1c] [ℹ] subnets for ap-northeast-1d - public:192.168.0.0/19 private:192.168.96.0/19 [ℹ] subnets for ap-northeast-1a - public:192.168.32.0/19 private:192.168.128.0/19 [ℹ] subnets for ap-northeast-1c - public:192.168.64.0/19 private:192.168.160.0/19 [ℹ] nodegroup "ng-7bfc0f1f" will use "ami-02e124a380df41614" [AmazonLinux2/1.14] [ℹ] using Kubernetes version 1.14 [ℹ] creating EKS cluster "mc-1237-testing" in "ap-northeast-1" region with un-managed nodes [ℹ] will create 2 separate CloudFormation stacks for cluster itself and the initial nodegroup [ℹ] if you encounter any issues, check CloudFormation console or try 'eksctl utils describe-stacks --region=ap-northeast-1 --cluster=mc-1237-testing' [ℹ] CloudWatch logging will not be enabled for cluster "mc-1237-testing" in "ap-northeast-1" [ℹ] you can enable it with 'eksctl utils update-cluster-logging --region=ap-northeast-1 --cluster=mc-1237-testing' [ℹ] Kubernetes API endpoint access will use default of {publicAccess=true, privateAccess=false} for cluster "mc-1237-testing" in "ap-northeast-1" [ℹ] 2 sequential tasks: { create cluster control plane "mc-1237-testing", create nodegroup "ng-7bfc0f1f" } [ℹ] building cluster stack "eksctl-mc-1237-testing-cluster" [ℹ] deploying stack "eksctl-mc-1237-testing-cluster" [ℹ] building nodegroup stack "eksctl-mc-1237-testing-nodegroup-ng-7bfc0f1f" [ℹ] --nodes-min=2 was set automatically for nodegroup ng-7bfc0f1f [ℹ] --nodes-max=2 was set automatically for nodegroup ng-7bfc0f1f [ℹ] deploying stack "eksctl-mc-1237-testing-nodegroup-ng-7bfc0f1f" [✔] all EKS cluster resources for "mc-1237-testing" have been created [✔] saved kubeconfig as "${HOME}/.kube/config" [ℹ] adding identity "arn:aws:iam::083751696308:role/eksctl-mc-1237-testing-nodegroup-NodeInstanceRole-KGOKLPVNIK10" to auth ConfigMap [ℹ] nodegroup "ng-7bfc0f1f" has 0 node(s) [ℹ] waiting for at least 2 node(s) to become ready in "ng-7bfc0f1f" [ℹ] nodegroup "ng-7bfc0f1f" has 2 node(s) [ℹ] node "ip-192-168-2-23.ap-northeast-1.compute.internal" is ready [ℹ] node "ip-192-168-48-84.ap-northeast-1.compute.internal" is ready [ℹ] kubectl command should work with "${HOME}/.kube/config", try 'kubectl get nodes' [✔] EKS cluster "mc-1237-testing" in "ap-northeast-1" region is ready
$ EKSCTL_EXPERIMENTAL=true eksctl enable repo \
--cluster mc-1237-testing \
--region ap-northeast-1 \
--git-email carre.marc+flux@gmail.com \
--git-url git@github.com:marccarre/my-gitops-repo.git
[ℹ] Generating public key infrastructure for the Helm Operator and Tiller [ℹ] this may take up to a minute, please be patient [!] Public key infrastructure files were written into directory "/var/folders/24/d3mml6bn20nftpt91cfldq1h0000gn/T/eksctl-helm-pki563648596" [!] please move the files into a safe place or delete them [ℹ] Generating manifests [ℹ] Cloning git@github.com:marccarre/my-gitops-repo.git Cloning into '/var/folders/24/d3mml6bn20nftpt91cfldq1h0000gn/T/eksctl-install-flux-clone-026154915'... remote: Enumerating objects: 43, done. remote: Counting objects: 100% (43/43), done. remote: Compressing objects: 100% (40/40), done. remote: Total 431 (delta 9), reused 35 (delta 3), pack-reused 388 Receiving objects: 100% (431/431), 177.90 KiB | 497.00 KiB/s, done. Resolving deltas: 100% (155/155), done. Already on 'master' Your branch is up to date with 'origin/master'. [ℹ] Writing Flux manifests [ℹ] created "Namespace/flux" [ℹ] Applying Helm TLS Secret(s) [ℹ] created "flux:Secret/flux-helm-tls-cert" [ℹ] created "flux:Secret/tiller-secret" [!] Note: certificate secrets aren't added to the Git repository for security reasons [ℹ] Applying manifests [ℹ] created "flux:ServiceAccount/flux" [ℹ] created "ClusterRole.rbac.authorization.k8s.io/flux" [ℹ] created "ClusterRoleBinding.rbac.authorization.k8s.io/flux" [ℹ] created "CustomResourceDefinition.apiextensions.k8s.io/helmreleases.helm.fluxcd.io" [ℹ] created "flux:Service/memcached" [ℹ] created "flux:ServiceAccount/tiller" [ℹ] created "ClusterRoleBinding.rbac.authorization.k8s.io/tiller" [ℹ] created "flux:ServiceAccount/helm" [ℹ] created "flux:Role.rbac.authorization.k8s.io/tiller-user" [ℹ] created "kube-system:RoleBinding.rbac.authorization.k8s.io/tiller-user-binding" [ℹ] created "flux:Deployment.extensions/tiller-deploy" [ℹ] created "flux:Deployment.apps/flux" [ℹ] created "flux:ConfigMap/flux-helm-tls-ca-config" [ℹ] created "flux:Deployment.apps/flux-helm-operator" [ℹ] created "flux:Deployment.apps/memcached" [ℹ] created "flux:Secret/flux-git-deploy" [ℹ] created "flux:ServiceAccount/flux-helm-operator" [ℹ] created "ClusterRole.rbac.authorization.k8s.io/flux-helm-operator" [ℹ] created "ClusterRoleBinding.rbac.authorization.k8s.io/flux-helm-operator" [ℹ] created "flux:Service/tiller-deploy" [ℹ] Waiting for Helm Operator to start [ℹ] Helm Operator started successfully [ℹ] see https://docs.fluxcd.io/projects/helm-operator for details on how to use the Helm Operator [ℹ] Waiting for Flux to start [ℹ] Flux started successfully [ℹ] see https://docs.fluxcd.io/projects/flux for details on how to use Flux [ℹ] Committing and pushing manifests to git@github.com:marccarre/my-gitops-repo.git [master f8e0c52] Add Initial Flux configuration 13 files changed, 803 insertions(+) create mode 100644 flux/flux-account.yaml create mode 100644 flux/flux-deployment.yaml create mode 100644 flux/flux-helm-operator-account.yaml create mode 100644 flux/flux-helm-release-crd.yaml create mode 100644 flux/flux-namespace.yaml create mode 100644 flux/flux-secret.yaml create mode 100644 flux/helm-operator-deployment.yaml create mode 100644 flux/memcache-dep.yaml create mode 100644 flux/memcache-svc.yaml create mode 100644 flux/tiller-ca-cert-configmap.yaml create mode 100644 flux/tiller-dep.yaml create mode 100644 flux/tiller-rbac.yaml create mode 100644 flux/tiller-svc.yaml Enumerating objects: 17, done. Counting objects: 100% (17/17), done. Delta compression using up to 8 threads Compressing objects: 100% (15/15), done. Writing objects: 100% (16/16), 9.33 KiB | 9.33 MiB/s, done. Total 16 (delta 1), reused 12 (delta 1) remote: Resolving deltas: 100% (1/1), done. To github.com:marccarre/my-gitops-repo.git 4b9a79d..f8e0c52 master -> master [ℹ] Flux will only operate properly once it has write-access to the Git repository [ℹ] please configure git@github.com:marccarre/my-gitops-repo.git so that the following Flux SSH public key has write access to it ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQCxoYrh1xqsHGQuJZnsY2hiOyplanBS/wmLQaxyPu2eMexmG1uy4Vq+e1qHQ6ukTlPSV92N2diz7Mml/VnfMIu6/S6WpcEa8s8cX+4X2w4DN5VGcOdMbRa76Td6me1Kp7X4BvQSpmtfj380+7dY+yxywTVf97ZFYq1atitxvjgVHIUCDLAXxqmM2t7OnH5nYEJFS+32BRmENMpzEfB+31PiOAgsUHENA4BCr0sbxDpKt3j4hzJbntgYQVyhaNLBH8S34Ogz1V0i8H5iplJ6YjsNXpeUhmRYFH4rKOTi0EJv7wEWMEH1gttQvLxhHAd6s4qDMB27aQSJFMh55/DW/r6Z
Then added the above SSH key to https://github.com/marccarre/my-gitops-repo/deploy_keys
$ kubectl get po --all-namespaces NAMESPACE NAME READY STATUS RESTARTS AGE flux flux-7696dbc4cd-4h927 1/1 Running 0 69s flux flux-helm-operator-8687676b89-hskbj 1/1 Running 0 68s flux memcached-5dcd7579-tpkvd 1/1 Running 0 69s flux tiller-deploy-69547b56b4-sp9md 1/1 Running 0 69s kube-system aws-node-97px5 1/1 Running 0 7m5s kube-system aws-node-kxbzd 1/1 Running 0 7m5s kube-system coredns-699bb99bf8-sn7ws 1/1 Running 0 13m kube-system coredns-699bb99bf8-zx26g 1/1 Running 0 13m kube-system kube-proxy-t2rvs 1/1 Running 0 7m5s kube-system kube-proxy-tkncf 1/1 Running 0 7m5s
$ EKSCTL_EXPERIMENTAL=true eksctl enable profile app-dev \
--cluster mc-1237-testing \
--region ap-northeast-1 \
--git-email carre.marc+flux@gmail.com \
--git-url git@github.com:marccarre/my-gitops-repo.git
Cloning into '/var/folders/24/d3mml6bn20nftpt91cfldq1h0000gn/T/my-gitops-repo-130038557'... remote: Enumerating objects: 47, done. remote: Counting objects: 100% (47/47), done. remote: Compressing objects: 100% (44/44), done. remote: Total 435 (delta 10), reused 38 (delta 3), pack-reused 388 Receiving objects: 100% (435/435), 179.62 KiB | 494.00 KiB/s, done. Resolving deltas: 100% (156/156), done. Already on 'master' Your branch is up to date with 'origin/master'. [ℹ] cloning repository "https://github.com/weaveworks/eks-quickstart-app-dev":master Cloning into '/var/folders/24/d3mml6bn20nftpt91cfldq1h0000gn/T/quickstart-019213272'... remote: Enumerating objects: 5, done. remote: Counting objects: 100% (5/5), done. remote: Compressing objects: 100% (4/4), done. remote: Total 214 (delta 0), reused 0 (delta 0), pack-reused 209 Receiving objects: 100% (214/214), 57.27 KiB | 322.00 KiB/s, done. Resolving deltas: 100% (92/92), done. Already on 'master' Your branch is up to date with 'origin/master'. [ℹ] processing template files in repository [ℹ] writing new manifests to "/var/folders/24/d3mml6bn20nftpt91cfldq1h0000gn/T/my-gitops-repo-130038557/base" [master 5e6bcf5] Add app-dev quickstart components 27 files changed, 1380 insertions(+) create mode 100644 base/LICENSE create mode 100644 base/README.md create mode 100644 base/amazon-cloudwatch/cloudwatch-agent-configmap.yaml create mode 100644 base/amazon-cloudwatch/cloudwatch-agent-daemonset.yaml create mode 100644 base/amazon-cloudwatch/cloudwatch-agent-rbac.yaml create mode 100644 base/amazon-cloudwatch/fluentd-configmap-cluster-info.yaml create mode 100644 base/amazon-cloudwatch/fluentd-configmap-fluentd-config.yaml create mode 100644 base/amazon-cloudwatch/fluentd-daemonset.yaml create mode 100644 base/amazon-cloudwatch/fluentd-rbac.yaml create mode 100644 base/demo/helm-release.yaml create mode 100644 base/kube-system/alb-ingress-controller-deployment.yaml create mode 100644 base/kube-system/alb-ingress-controller-rbac.yaml create mode 100644 base/kube-system/cluster-autoscaler-deployment.yaml create mode 100644 base/kube-system/cluster-autoscaler-rbac.yaml create mode 100644 base/kubernetes-dashboard/dashboard-metrics-scraper-deployment.yaml create mode 100644 base/kubernetes-dashboard/dashboard-metrics-scraper-service.yaml create mode 100644 base/kubernetes-dashboard/kubernetes-dashboard-configmap.yaml create mode 100644 base/kubernetes-dashboard/kubernetes-dashboard-deployment.yaml create mode 100644 base/kubernetes-dashboard/kubernetes-dashboard-rbac.yaml create mode 100644 base/kubernetes-dashboard/kubernetes-dashboard-secrets.yaml create mode 100644 base/kubernetes-dashboard/kubernetes-dashboard-service.yaml create mode 100644 base/monitoring/metrics-server.yaml create mode 100644 base/monitoring/prometheus-operator.yaml create mode 100644 base/namespaces/amazon-cloudwatch.yaml create mode 100644 base/namespaces/demo.yaml create mode 100644 base/namespaces/kubernetes-dashboard.yaml create mode 100644 base/namespaces/monitoring.yaml Enumerating objects: 37, done. Counting objects: 100% (37/37), done. Delta compression using up to 8 threads Compressing objects: 100% (28/28), done. Writing objects: 100% (36/36), 13.52 KiB | 13.52 MiB/s, done. Total 36 (delta 7), reused 25 (delta 7) remote: Resolving deltas: 100% (7/7), done. To github.com:marccarre/my-gitops-repo.git f8e0c52..5e6bcf5 master -> master
$ kubectl get po --all-namespaces NAMESPACE NAME READY STATUS RESTARTS AGE amazon-cloudwatch cloudwatch-agent-6km5b 1/1 Running 0 109m amazon-cloudwatch cloudwatch-agent-kcpb9 1/1 Running 0 109m amazon-cloudwatch fluentd-cloudwatch-8wxxn 1/1 Running 0 109m amazon-cloudwatch fluentd-cloudwatch-nst52 1/1 Running 0 109m demo podinfo-67b7886b6c-pjws4 1/1 Running 0 109m flux flux-7696dbc4cd-4h927 1/1 Running 0 116m flux flux-helm-operator-8687676b89-hskbj 1/1 Running 0 115m flux memcached-5dcd7579-tpkvd 1/1 Running 0 116m flux tiller-deploy-69547b56b4-sp9md 1/1 Running 0 116m kube-system alb-ingress-controller-776b5b58c9-bbt7t 1/1 Running 0 109m kube-system aws-node-97px5 1/1 Running 0 121m kube-system aws-node-kxbzd 1/1 Running 0 121m kube-system cluster-autoscaler-55d556f787-rm7cc 0/1 CrashLoopBackOff 25 109m kube-system coredns-699bb99bf8-sn7ws 1/1 Running 0 128m kube-system coredns-699bb99bf8-zx26g 1/1 Running 0 128m kube-system kube-proxy-t2rvs 1/1 Running 0 121m kube-system kube-proxy-tkncf 1/1 Running 0 121m kubernetes-dashboard dashboard-metrics-scraper-65785bfbc-52952 1/1 Running 0 109m kubernetes-dashboard kubernetes-dashboard-76b969b44b-hf9kd 1/1 Running 0 109m monitoring alertmanager-prometheus-operator-alertmanager-0 2/2 Running 0 108m monitoring metrics-server-5df4599bd7-l5b8q 1/1 Running 0 109m monitoring prometheus-operator-grafana-dd95fb7d4-gzqxn 2/2 Running 0 109m monitoring prometheus-operator-kube-state-metrics-5d7558d7cc-qx4tl 1/1 Running 0 109m monitoring prometheus-operator-operator-67895dd7c5-nhbbv 1/1 Running 0 109m monitoring prometheus-operator-prometheus-node-exporter-77nb6 1/1 Running 0 109m monitoring prometheus-operator-prometheus-node-exporter-hfdv9 1/1 Running 0 109m monitoring prometheus-prometheus-operator-prometheus-0 3/3 Running 1 108m
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/weaveworks/eksctl/issues/1237?email_source=notifications&email_token=AAGZZDUQSC4GAPWRWR5DKULQX5WBJA5CNFSM4ISMF6YKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEGOYMGY#issuecomment-563971611, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAGZZDXBZC2E5AXMQ72KFG3QX5WBJANCNFSM4ISMF6YA .
If you follow the instructions word for word it fails.
Which instructions were you following exactly @ilanpillemer? (Could you please share a link to them to ensure we are on the same page, and/or so that we know if we need to update/correct anything published elsewhere? 🙇 )
If you are talking about something else than this, would you have any suggestion to make these instructions clearer?
Note that the pre-requisites for the app-dev
profile are documented here: https://github.com/weaveworks/eks-quickstart-app-dev#pre-requisites, but any suggestion on how to improve this & make it more obvious is always welcome! ✨
You need to add the roles with the necessary config when creating the cluster.
Yes, this is what the first two commands in what I shared here were hoping to show, i.e.:
Use a ClusterConfig
with the appropriate roles, e.g. examples/eks-quickstart-app-dev.yaml
:
$ git diff
diff --git a/examples/eks-quickstart-app-dev.yaml b/examples/eks-quickstart-app-dev.yaml
[...]
Indeed, this file define the following IAM roles: https://github.com/weaveworks/eksctl/blob/796d9f48c2f70732e27aebeee1c38a864cda88a8/examples/eks-quickstart-app-dev.yaml#L16-L20
Create the cluster by passing a reference to this file.
$ eksctl create cluster -f examples/eks-quickstart-app-dev.yaml
[...]
Yes. Now it seems completely obvious what I had to do with hindsight. I think a very minor tweak would help. I used the gitops quick start guide at eksctl.io. When I look now it says some variant of the command should be used. Perhaps a few more words like for example if you need the auto scaler or alb ingress then the necessary switches you can find in the documents should be used. Or something similar. Great work with eksctl and flux, they are game changing.
What happened? Deployed with
eksctl gitops apply
and after deployment and adding flux's ssh key to my gitops repo, cluster autoscaler doesn't start:My cluster looks as follows:
The error in the logs of the cluster-autoscaler container are: