argoproj / argo-cd

Declarative Continuous Deployment for Kubernetes
https://argo-cd.readthedocs.io
Apache License 2.0
17.55k stars 5.35k forks source link

argocd-redis-ha-haproxy pod is giving CrashLoopBackOff error with multiple restart after upgrading eks version to 1.18 #5425

Open soumitasardar opened 3 years ago

soumitasardar commented 3 years ago

If you are trying to resolve an environment-specific issue or have a one-off question about the edge case that does not require a feature then please consider asking a question in argocd slack channel.

Checklist:

Describe the bug EKS cluster with version 1.17, argocd applications and all pods were up and running. We are using argocd with HA. After eks version upgrade to 1.18, and restarting the nodes with latest image, argocd-redis-ha-haproxy pod stucked in CrashLoopBackOff. This is describe pod for argocd-redis-ha-haproxy output:

  Type     Reason     Age                    From     Message
  ----     ------     ----                   ----     -------
  Normal   Pulled     44m (x18 over 85m)     kubelet  Container image "haproxy:2.0.4" already present on machine
  Warning  Unhealthy  24m (x70 over 85m)     kubelet  Liveness probe failed: Get http://xxx:8888/healthz: dial tcp xxx:8888: connect: connection refused
  Warning  BackOff    4m18s (x290 over 81m)  kubelet  Back-off restarting failed container

argocd version: 1.8.3
ha-proxy: 2.0.4
redis:5.0.10-alpine

we are using ha install.yaml as mentioned in https://github.com/argoproj/argo-cd.git relaese tag 1.8.3 --> manifest-->ha --->install.yaml A clear and concise description of what the bug is.

To Reproduce

A list of the steps required to reproduce the issue. Best of all, give us the URL to a repository that exhibits this issue.

Expected behavior

A clear and concise description of what you expected to happen.

Screenshots

If applicable, add screenshots to help explain your problem.

Version

Paste the output from `argocd version` here.

argocd: 1.8.3

Logs

Paste any relevant application logs here.
jessesuen commented 3 years ago

Liveness probe failed: Get http://xxx:8888/healthz: dial tcp xxx:8888: connect: connection refused

It's apparent that kubelet can't connect to HA proxy (connection refused). Are you able to connect to this port via port-forwarding? Logs of haproxy may indicate why it's not coming up.

soumitasardar commented 3 years ago

Hi @jessesuen , Thanks for the response!!!

logs for that particular pod kubectl logs argocd-redis-ha-haproxy-766c67b584-pwnqd -n argocd [ALERT] 038/093302 (1) : parsing [/usr/local/etc/haproxy/haproxy.cfg:25] : 'server R1' : could not resolve address 'argocd-redis-ha-announce-1'.

The 3 replicas of haproxy, 2 are seems running, but an see multiple restarts. image

Are you able to connect to this port via port-forwarding?: Please help , how to check this....

soumitasardar commented 3 years ago

Sevices created : image

Even after running the command: ANNOUNCE_IP=$(getent hosts "argocd-redis-ha-announce-1" | awk '{ print $1 }')

I am not getting any output. Can anyone please help if this is the issue for not able to resolve the address. And how to resolve it.

soumitasardar commented 3 years ago

Hi, Can anyone please help to resolve the issue?

JihadMotii-REISys commented 3 years ago

I'm facing the same issue

seyukun commented 2 years ago

I'm facing the same issue...

seyukun commented 2 years ago

image image

BeckYeh commented 2 years ago

I try helm chart version 4.7.0 face the same problem in my homelab with cluster version 1.22.9

Liveness probe failed: Get http://xxx:8888/healthz: dial tcp xxx:8888: connect: connection refused

rshiva777 commented 2 years ago

I am facing same issue. able to see below log from one of the redis-ha-proxy pod

[ALERT] 192/144952 (1) : parsing [/usr/local/etc/haproxy/haproxy.cfg:24] : 'server R0' : could not resolve address 'argocd-redis-ha-announce-0'.

On one of the cluster its running fine, but on other cluster getting this error. OS and kubernetes versions are same on both clusters, any advice would be appreciated

rshiva777 commented 2 years ago

Hello all, cluster networking was broken, hence i faced the below error

[ALERT] 192/144952 (1) : parsing [/usr/local/etc/haproxy/haproxy.cfg:24] : 'server R0' : could not resolve address 'argocd-redis-ha-announce-0'.

After restarting of calico pods fixed the issue.

easywang commented 1 year ago

I try helm chart version 4.7.0 face the same problem in my homelab with cluster version 1.22.9

Liveness probe failed: Get http://xxx:8888/healthz: dial tcp xxx:8888: connect: connection refused

same problem

chengpeng21186 commented 11 months ago

I am not getting any output. Can anyone please help if this is the issue for not able to resolve the address. And how to resolve it.

I install coredns,reslove this problem

Ramces182 commented 1 month ago

k get all -n kube-system --show-labels Copy the label of the core-dns pods, for me it was: k8s-app=kube-dns,pod-template-hash=787d4945fb

k delete pods -l k8s-app=kube-dns,pod-template-hash=787d4945fb -n kube-system pod "coredns-787d4945fb-dbzkg" deleted pod "coredns-787d4945fb-pkb56" deleted

that fixed my issue. Good luck!