argoproj / argo-cd

Declarative Continuous Deployment for Kubernetes
https://argo-cd.readthedocs.io
Apache License 2.0
17.34k stars 5.27k forks source link

argocd stable installation && repo server cannot connect to any github/gitlab repository `transport: Error while dialing dial tcp 172.20.86.254:8081: i/o timeout` #10023

Open rzvn2600 opened 2 years ago

rzvn2600 commented 2 years ago

Checklist:

Describe the bug

Hi All,

I am tryinf to deploy an application ito a fresh argocd installation and it does not work. I cannot create an app from the UI/CLI or through kubectl apply. Everything fails with the following error:

➜  argo_test_yaml argocd app create --name blue-green --repo https://github.com/argoproj/argocd-example-apps --dest-server https://kubernetes.default.svc --dest-namespace default --path blue-green && argocd app sync blue-green

FATA[0020] rpc error: code = InvalidArgument desc = application spec for blue-green is invalid: InvalidSpecError: repository not accessible: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp 172.20.86.254:8081: i/o timeout" 

Pod stats:

➜  terraform kubectl get pods -n argocd
NAME                                                READY   STATUS    RESTARTS   AGE
argocd-application-controller-0                     1/1     Running   0          2d19h
argocd-applicationset-controller-58f69d4b8f-tdm84   1/1     Running   0          2d19h
argocd-dex-server-ff489bd4-vmn6h                    1/1     Running   0          2d19h
argocd-notifications-controller-567f4c469-699jp     1/1     Running   0          2d19h
argocd-redis-55d64cd8bf-6kp9n                       1/1     Running   0          3h52m
argocd-repo-server-86d9878d56-cmw8r                 1/1     Running   0          102m
argocd-server-54bc687b4b-jn4zj                      1/1     Running   0          101m

kubectl logs:

argocd-server

time="2022-07-18T10:34:49Z" level=info msg="received unary call /application.ApplicationService/Create" grpc.method=Create grpc.request.claims="{\"exp\":1658226650,\"iat\":1658140250,\"iss\":\"argocd\",\"jti\":\"cc016145-c46d-4fc3-b82a-95bc5270bac1\",\"nbf\":1658140250,\"sub\":\"admin\"}" grpc.request.content="%!v(PANIC=String method: reflect.Value.Interface: cannot return value obtained from unexported field or method)" grpc.service=application.ApplicationService grpc.start_time="2022-07-18T10:34:49Z" span.kind=server system=grpc
time="2022-07-18T10:35:09Z" level=info msg="finished unary call with code InvalidArgument" error="rpc error: code = InvalidArgument desc = application spec for blue-green is invalid: InvalidSpecError: repository not accessible: rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing dial tcp 172.20.86.254:8081: i/o timeout\"" grpc.code=InvalidArgument grpc.method=Create grpc.service=application.ApplicationService grpc.start_time="2022-07-18T10:34:49Z" grpc.time_ms=20024.885 span.kind=server system=grpc
time="2022-07-18T10:35:22Z" level=info msg="Alloc=12679 TotalAlloc=91128 Sys=28497 NumGC=53 Goroutines=76"

argocd-repo-server different app deployed:

time="2022-07-18T10:38:20Z" level=error msg="finished unary call with code Unknown" error="unexpected client error: unexpected requesting \"https://xxx.xxx.com/xx.xx/argocd/info/refs?service=git-upload-pack\" status code: 301" grpc.code=Unknown grpc.method=GenerateManifest grpc.service=repository.RepoServerService grpc.start_time="2022-07-18T10:38:20Z" grpc.time_ms=175.553 span.kind=server system=grpc`

CLI logs from app deployed through kubectl:

➜  terraform argocd app logs guestbook --loglevel debug
FATA[0000] stream read failed: rpc error: code = Unknown desc = error getting app resource tree: error getting cached app state: ComparisonError: rpc error: code = Unknown desc = unexpected client error: unexpected requesting "ttps://xxx.xxx.com/xx.xx/argocd/info/refs?service=git-upload-pack" status code: 301 
➜  terraform 

app.yaml:

apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: guestbook
  namespace: argocd
spec:
  project: default
  source:
    repoURL: https://xxx.xxx.com/xxx.xxx/argocd
    targetRevision: HEAD
    path: guestbook
  destination:
    server: https://kubernetes.default.svc
    namespace: guestbook

But from the POD level I can easily clone the repo without any credentials. no networking issue:

argocd@argocd-server-fd9ccd79b-dxrd6:~$ git clone https://xxx.xxx.com/xx.xx/argocd.git
Cloning into 'argocd'...
remote: Enumerating objects: 7, done.
remote: Counting objects: 100% (7/7), done.
remote: Compressing objects: 100% (5/5), done.
remote: Total 7 (delta 0), reused 0 (delta 0), pack-reused 0
Receiving objects: 100% (7/7), done.
argocd@argocd-server-fd9ccd79b-dxrd6:~$ ls -ltr
total 0
drwxr-xr-x 4 argocd argocd 61 Jul 15 09:00 argocd
argocd@argocd-server-fd9ccd79b-dxrd6:~$ ls -ltr *
total 8
drwxr-xr-x 2 argocd argocd   33 Jul 15 09:00 argocd-appprojects
-rw-r--r-- 1 argocd argocd 6233 Jul 15 09:00 README.md
argocd@argocd-server-fd9ccd79b-dxrd6:~$ 
exit

Basically I cannot even create any app inside argocd because of the above error. To Reproduce

Follow the installation step by step from the following argocd "getting started" documentation: https://argo-cd.readthedocs.io/en/stable/getting_started/

Expected behavior

It should add at least the example repository from the documentation and start working but instead I can see the error mentioned above.

Version

argocd: v2.4.6+a48bca0
  BuildDate: 2022-07-12T22:56:26Z
  GitCommit: a48bca03c79b6d63be0c34d6094831bc6916b3bc
  GitTreeState: clean
  GoVersion: go1.18.3
  Compiler: gc
  Platform: linux/amd64
argocd-server: v2.4.6+a48bca0
  BuildDate: 2022-07-12T22:31:17Z
  GitCommit: a48bca03c79b6d63be0c34d6094831bc6916b3bc
  GitTreeState: clean
  GoVersion: go1.18.3
  Compiler: gc
  Platform: linux/amd64
  Kustomize Version: v4.4.1 2021-11-11T23:36:27Z
  Helm Version: v3.8.1+g5cb9af4
  Kubectl Version: v0.23.1
  Jsonnet Version: v0.18.0

Logs

time="2022-07-18T09:16:34Z" level=info msg="received unary call /application.ApplicationService/Sync" grpc.method=Sync grpc.request.claims="{\"exp\":1658175987,\"iat\":1658089587,\"iss\":\"argocd\",\"jti\":\"9f2a7563-66d3-4547-bbb9-3665c36ef8fc\",\"nbf\":1658089587,\"sub\":\"admin\"}" grpc.request.content="name:\"guestbook\" revision:\"\" dryRun:false prune:false strategy:<hook:<syncStrategyApply:<force:false > > > " grpc.service=application.ApplicationService grpc.start_time="2022-07-18T09:16:34Z" span.kind=server system=grpc
time="2022-07-18T09:16:55Z" level=warning msg="finished unary call with code FailedPrecondition" error="rpc error: code = FailedPrecondition desc = error resolving repo revision: rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing dial tcp 172.20.86.254:8081: i/o timeout\"" grpc.code=FailedPrecondition grpc.method=Sync grpc.service=application.ApplicationService grpc.start_time="2022-07-18T09:16:34Z" grpc.time_ms=20017.8 span.kind=server system=grpc
terraform_course argocd repo list
TYPE  NAME  REPO                                             INSECURE  OCI    LFS    CREDS  STATUS  MESSAGE                                                                                                                                                                                                                                                       PROJECT
git         https://xxx.xxx.com/xxx.xxx/argocd  false     false  false  false  Failed  Unable to connect to repository: rpc error: code = Unknown desc = error testing repository connectivity: unexpected client error: unexpected requesting "https://xxx.xxx.com/xxx.xxx/argocd/info/refs?service=git-upload-pack" status code: 301  a

Please I will have to remind you just in case that I can connect to any pod from the argocd deployemnts and directly clone the above mentioned xxx repository manually without any issues. There is no network policy that forbids access.

I have tried to re-deploy the whole argocd in EKS successfully. I have tried to inject the env variables (_ARGOCD_REPO_SERVER_LOGLEVEL=debug && ARGOCD_REPO_SERVERLOGLEVEL=debug) to get more inshigh into the issue but without any result. The same error appears.

vijayvenkat34 commented 2 years ago

I am also getting error

Failed to pull image "quay.io/argoproj/argocd:v2.4.6": rpc error: code = Unknown desc = failed to pull and unpack image "quay.io/argoproj/argocd:v2.4.6": failed to resolve reference "quay.io/argoproj/argocd:v2.4.6": failed to do request: Head https://quay.io/v2/argoproj/argocd/manifests/v2.4.6: proxyconnect tcp: dial tcp 10.224.2.11:9119: i/o timeout

rzvn2600 commented 2 years ago

Any idea how to debug this please ?

DavidKittleSEL commented 2 years ago

I had a similar issue with the v2.4.7 version of install.yaml. It seems to indicate that the argocd server pod can't communicate with the argo repo server pod. For me the fix was to delete/disable all the network policies, after a few minutes everything started working.

Joseph94m commented 2 years ago

Hi, getting something similar when I tried adding a gitlab repo with argocd 2.4.9. Either with ssh or https.

argocd repo add git@gitlab.com:/account-name/yaml-holder.git --ssh-private-key-path ./argocd_id_rsa
FATA[0132] rpc error: code = Unknown desc = error testing repository connectivity: dial tcp 172.65.251.78:22: connect: connection timed out 
 argocd repo add https://gitlab.com/account-name/yaml-holder.git --username myuser--password mypassword
FATA[0015] rpc error: code = Unknown desc = error testing repository connectivity: Get "https://gitlab.com/account-name//yaml-holder.git/info/refs?service=git-upload-pack": dial tcp 172.65.251.78:443: i/o timeout (Client.Timeout exceeded while awaiting headers) 

I am also able to clone into the pod...

Any ideas?

brsolomon-deloitte commented 1 year ago

Having this issue as well and it defintely points to an outright inability for ArgoCD to reach its own repo server.

From an argocd app sync we receive ComparisonError: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp 172.xx.xx.xx:8081: i/o timeout"

While a k logs deploy/argocd-repo-server doesn't show any evidence at all the repo server was actually reachable or received any rpc request.

nilivingston commented 1 year ago

I ran into this issue with Argo CD deployed to an EKS cluster managed by Terraform. The problem was that I had configured the managed node group defaults in Terraform with attach_cluster_primary_security_group = false. This meant that the cluster security group was not attached to the nodes. The only rule in the cluster security group was a self-referential rule allowing all traffic.

Attaching the EKS-created cluster security group to the nodes by modifying the Terraform configuration resolved the issue for me.

andyfcx commented 1 year ago

Any updates? Mine cannot work even with all NetworkPolicy disabled.

ronnyworm commented 2 months ago

DavidKittleSEL's solution does not fit a production scenario but on a test cluster it's fine. I also deleted the -n argocd pods afterwards. Maybe deleting the application-controller pod is enough (see https://github.com/argoproj/argo-cd/issues/10666#issuecomment-1277424370). Then after a few minutes, everything settled.

I installed argocd like this

kubectl create ns argocd
kubectl apply -n argocd -f https://raw.githubusercontent.com/argoproj/argo-cd/stable/manifests/install.yaml

This installed v2.11.3+3f344d5 for me and with this version I'm having this issue.

christianh814 commented 2 months ago

Is this still an issue in newer versions? I see this is for v2.4.6 but that's currently not supported (and quite old). Does this exist in v2.9+?

ronnyworm commented 2 months ago

Is this still an issue in newer versions? I see this is for v2.4.6 but that's currently not supported (and quite old). Does this exist in v2.9+?

See my edited comment

stokkie90 commented 2 months ago

Having similar issues, and seeing it happens when repo-server scales down (HPA).

With notifications enabled on Unknown it will report it and you can compare it to the scaling actions.

nulluuid commented 2 months ago

Have the same error with v2.9.18+151ee6a on minikube (podman)

argocd repo add https://github.com/argoproj/argocd-example-apps.git

FATA[0020] rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp: lookup argocd-repo-server: i/o timeout"

UPD1

I using minikube on podman and just realized that my installation is not healthy

Failed to inspect image "redis:7.0.15-alpine": rpc error: code = Unknown desc = short-name "redis:7.0.15-alpine" did not resolve to an alias and no unqualified-search registries are defined in "/etc/containers/registries.conf"

Because of podman doesn't resolve short names. See this.

UPD2 Go to deployments and change

image: docker.io/redis:7.0.15-alpine

ronnyworm commented 2 months ago

I fixed it: The problem was that I was running Flannel CNI. I tried Calico for the CNI now and everything works like a charm. To install Calico, the best and easiest way seems to be helm:

from: https://docs.tigera.io/calico/latest/getting-started/kubernetes/helm

helm repo add projectcalico https://docs.tigera.io/calico/charts
kubectl create namespace tigera-operator
helm install calico projectcalico/tigera-operator --version v3.28.0 --namespace tigera-operator

Then it takes about a minute until everything is running and ready.

It should be mentioned in the argocd requirements/readme that it doesn't work with Flannel

ggeldenhuis commented 4 days ago

I am getting the same error on a k3s multinode cluster (v1.30.3+k3s1) and using argocd version v2.12.3+6b9cd82. So far restarting pods has had no affect. I don't really want to be deleting network policies either.

I probably don't understand the problem but grep does not see any mention of "argocd-redis-ha-haproxy" or "haproxy" in the install file. That is in reference to the error:

Unable to load data: error getting cached app managed resources: dial tcp: lookup argocd-redis-ha-haproxy on 10.43.0.10:53: no such host

I also can't see any service with such a name so not sure why it is trying to lookup that specific name and not sure where that "name" would be created in the first place.