argoproj / argo-cd

Declarative Continuous Deployment for Kubernetes
https://argo-cd.readthedocs.io
Apache License 2.0
17.66k stars 5.38k forks source link

Issue with Openshift 3.9: ComparisonError: invalid character 'T' after top-level value #2778

Closed jaydipdave closed 8 months ago

jaydipdave commented 4 years ago

Describe the bug Deployment on EKS and GKE works great. We have to deploy applications on OpenShift as well. OpenShi[f]t version:

oc v3.11.0+0cbc58b
kubernetes v1.11.0+d4cacc0
features: Basic-Auth

Server https://openshift1.at.company:8443
openshift v3.9.99
kubernetes v1.9.1+a0ce1bc657

Added the OpenShift cluster using : argocd cluster add --kubeconfig=/Users/xxxxxx/.kube/openshift1 --insecure "openshift"

argocd cluster list                                          
SERVER                                                NAME                                                                             VERSION  STATUS      MESSAGE
https://bastion.gke        gke_nonprod_northamerica-northeast1    1.13+    Successful
https://bastion.gke        gke_prod_northamerica-northeast1  1.13+    Successful
https://openshift1.at.company:8443  openshift        1.9      Successful
https://kubernetes.default.svc                            Successful

Created the application and saved it successfully:

project: 5000-nonproduction-development-ds
source:
  repoURL: 'https://git.at.company.com/scm/project/ffuf-deployment.git'
  path: webapp-openshift
  targetRevision: feature/argo
  helm:
    valueFiles:
      - values.yaml
      - deployment_configs/common.yaml
      - deployment_configs/development/common.yaml
    releaseName: ffuf
destination:
  server: 'https://openshift1.at.company:8443'
  namespace: 5000-nonproduction-development-ds

image image image image

I checked the "argocd-manager" service account and related clusterrole / clusterrolebindings. They are created properly in kube-system namespace. I can also access the openshift using that service account.

Version

argocd: v1.3.0+9f8608c
  BuildDate: 2019-11-13T01:51:29Z
  GitCommit: 9f8608c9fcb2a1d8dcc06eeadd57e5c0334c5800
  GitTreeState: clean
  GoVersion: go1.12.6
  Compiler: gc
  Platform: darwin/amd64
argocd-server: v1.3.0+9f8608c
  BuildDate: 2019-11-13T01:51:00Z
  GitCommit: 9f8608c9fcb2a1d8dcc06eeadd57e5c0334c5800
  GitTreeState: clean
  GoVersion: go1.12.6
  Compiler: gc
  Platform: linux/amd64
  Ksonnet Version: v0.13.1
  Kustomize Version: Version: {Version:kustomize/v3.2.1 GitCommit:d89b448c745937f0cf1936162f26a5aac688f840 BuildDate:2019-09-27T00:10:52Z GoOs:linux GoArch:amd64}
  Helm Version: v2.15.2
  Kubectl Version: v1.14.0

Logs

argocd-repo

time="2019-11-27T16:40:12Z" level=debug msg="revision 'feature/argo' resolved to 'be654cceb79708e3ef5d40edda4cada081204a15'"
time="2019-11-27T16:40:12Z" level=info msg="manifest cache hit: &ApplicationSource{RepoURL:https://git.at.company.com/scm/project/ffuf-deployment.git,Path:webapp-openshift,TargetRevision:feature/argo,Helm:&ApplicationSourceHelm{ValueFiles:[values.yaml deployment_configs/common.yaml deployment_configs/development/common.yaml],Parameters:[],ReleaseName:ffuf,Values:,},Kustomize:nil,Ksonnet:nil,Directory:nil,Plugin:nil,Chart:,}/be654cceb79708e3ef5d40edda4cada081204a15"

argocd-server

time="2019-11-27T16:34:46Z" level=info msg="Comparing app state (cluster: https://openshift1.at.company:8443, namespace: 5000-nonproduction-development-ds)" application=5000-nonproduction-development-ds
time="2019-11-27T16:34:46Z" level=info msg="Start syncing cluster" server="https://openshift1.at.company:8443"
time="2019-11-27T16:40:12Z" level=error msg="Failed to sync cluster https://openshift1.at.company:8443: invalid character 'T' after top-level value"
time="2019-11-27T16:40:12Z" level=info msg="Reconciliation completed" application=5000-nonproduction-development-ds dest-namespace=5000-nonproduction-development-ds dest-server="https://openshift1.at.company:8443" fields.level=2 time_ms=326212.655007
alexec commented 4 years ago

looks like a problem with JSON marshaling - anyone running OpenShift able to help?

jaydipdave commented 4 years ago

I was debugging the issue with OpenShift.

My findings below:

This could be the case with many OpenShift or Kubernetes clusters, specially in Financial institutions.

Do we really need to fetch all the resources of all the resource types at the time of synchronization of an application? Can't we limit it to the namespaces mentioned in the project?

I also tried narrowing down the "argocd-manager" role permissions to a specific namespace. The response is faster, but that doesn't work.

The TImeout happens here: https://github.com/argoproj/argo-cd/blob/master/controller/cache/cluster.go#L320

jaydipdave commented 4 years ago

2176 / #2839 Should fix this issue... Waiting for the pull request to be merged and the new version released :)

jaydipdave commented 4 years ago

Duplicate of #2176

alexmt commented 4 years ago

Argo CD does not use pagination during initial cluster state fetching. This could potentially cause a timeout error. The pagination was introduced in https://github.com/argoproj/argo-cd/pull/3299 .

The change is is available in https://github.com/argoproj/argo-cd/releases/tag/v1.5.0-rc3 . Please give it a try.