argoproj / argo-cd

Declarative Continuous Deployment for Kubernetes
https://argo-cd.readthedocs.io
Apache License 2.0
17.82k stars 5.44k forks source link

502 bad gateway when deploy argocd in GKE cluster #20071

Open SergeiCherevko opened 1 month ago

SergeiCherevko commented 1 month ago

Describe the bug

When I run:

argocd login argocd.example.com

I receive the following error:

FATA[0000] rpc error: code = Unavailable desc = unexpected HTTP status code received from server: 502 (Bad Gateway); transport: received unexpected content-type "text/html; charset=UTF-8"

To Reproduce

According to this documentation, I use the following manifests to deploy Argo CD in a GKE cluster.

First step

kubectl apply -n argocd -f https://raw.githubusercontent.com/argoproj/argo-cd/stable/manifests/install.yaml

Then apply

apiVersion: v1
kind: ConfigMap
metadata:
  name: argocd-cmd-params-cm
  namespace: argocd
  labels:
    app.kubernetes.io/name: argocd-cmd-params-cm
    app.kubernetes.io/part-of: argocd
data:
  server.insecure: "true"

---
apiVersion: v1
kind: ConfigMap
metadata:
  name: argocd-cm
  namespace: argocd
  labels:
    app.kubernetes.io/name: argocd-cmd-params-cm
    app.kubernetes.io/part-of: argocd
data:
  resource.exclusions: |
    - apiGroups:
      - cilium.io
      kinds:
      - CiliumIdentity
      clusters:
      - "*"

---
apiVersion: v1
kind: Service
metadata:
  name: argocd-server
  namespace: argocd
  annotations:
    cloud.google.com/neg: '{"ingress": true}'
    cloud.google.com/backend-config: '{"ports": {"http":"argocd-backend-config"}}'
spec:
  type: ClusterIP
  ports:
    - name: http
      port: 80
      protocol: TCP
      targetPort: 8080
  selector:
    app.kubernetes.io/name: argocd-server

---
apiVersion: cloud.google.com/v1
kind: BackendConfig
metadata:
  name: argocd-backend-config
  namespace: argocd
spec:
  healthCheck:
    checkIntervalSec: 30
    timeoutSec: 5
    healthyThreshold: 1
    unhealthyThreshold: 2
    type: HTTP
    requestPath: /healthz
    port: 8080

---
apiVersion: networking.gke.io/v1beta1
kind: FrontendConfig
metadata:
  name: argocd-frontend-config
  namespace: argocd
spec:
  redirectToHttps:
    enabled: true

---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: argocd
  namespace: argocd
  annotations:
    networking.gke.io/v1beta1.FrontendConfig: argocd-frontend-config
    kubernetes.io/ingress.global-static-ip-name: argocd-ingress-external-us
spec:
  tls:
    - secretName: secret-example-com
  rules:
    - host: argocd.example.com
      http:
        paths:
          - pathType: Prefix
            path: "/"
            backend:
              service:
                name: argocd-server
                port:
                  number: 80

I receive the load balancer IP address, which correctly points to the Argo CD domain (see the screenshot). When I try to forward the port directly from the service, I can access the Argo CD web page on localhost, and everything works fine. Therefore, the problem seems to be with ingress or load balancing.

Additionally, sometimes everything works correctly. However, when I delete the GKE cluster and create a new identical one, I encounter issues with Argo CD and receive the 502 error. It appears that it either works (if you're lucky) or nothing functions correctly.

When i restart argocd-server pod from ReplicaSet, everything is fixing!!

image image

Expected behavior

I expect to see the Argo CD main page with the login field without restarting argocd-server ReplicaSet

Screenshots With problems

image image

Logs If you need additional logs, please let me know where I can find them.

agaudreault commented 2 weeks ago

This seems to be a configuration error with your Ingress resource. There are many ingresses, and you need to configure argo to use the one for your infrastructure. Some ingresses configuration will take time to propagate, and features such as https://kubernetes-sigs.github.io/aws-load-balancer-controller/v2.1/deploy/pod_readiness_gate/ need to be enabled to mitigate it. Since you are getting an inconsistent behavior and it sometimes work, look at the ingress provisioned in your infrastructure's health check. it is likely that if the ingress does not have any healthy pods available, it returns a 502.