argoproj / argo-cd

Declarative Continuous Deployment for Kubernetes
https://argo-cd.readthedocs.io
Apache License 2.0
17.59k stars 5.36k forks source link

Argo CD fails to sync a specific custom resource (Calico GlobalNetworkPolicy) #2505

Closed bikramgupta closed 4 years ago

bikramgupta commented 4 years ago

Describe the bug Argo CD is running into error while trying to sync a custom resource (kind:GlobalNetworkPolicy, apiVersion: projectcalico.org/v3).

To Reproduce Sync this YAML via argo CD to reproduce the issue. https://github.com/bikram20/k8sconfig/blob/master/secops/gnp-demo/security.quarantine.yaml

The above is just a simple network policy in Calico. For testing, it will not disrupt anything, as it applies only to the pods labeled appropriately.

Expected behavior

While syncing the YAMLs with kind:GlobalNetworkPolicy (apiVersion: projectcalico.org/v3), argo CD runs into unexpected error.

It interprets the resource into 2 different objects as shown below and in the screenshot. While the object is applied successfully (and even gets recreated if I delete that), somehow argo CD interprets the sync as unsuccessful.

[centos@ip-172-31-8-215 argocd]$ argocd app get secops-demo
Name:               secops-demo
Project:            default
Server:             https://kubernetes.default.svc
Namespace:          default
URL:                https://10.101.75.252/applications/secops-demo
Repo:               https://github.com/bikram20/k8sconfig.git
Target:             
Path:               secops
Sync Policy:        Automated (Prune)
Sync Status:        OutOfSync from  (c5672a7)
Health Status:      Missing

GROUP                  KIND                 NAMESPACE    NAME                          STATUS     HEALTH       HOOK  MESSAGE
crd.projectcalico.org  GlobalNetworkPolicy               security.quarantine           OutOfSync  Progressing        pruned
projectcalico.org      GlobalNetworkPolicy  default      security.quarantine           Running    Synced             globalnetworkpolicy.projectcalico.org/security.quarantine unchanged
networking.k8s.io      NetworkPolicy        policy-demo  access-nginx                  Synced     Unknown            
networking.k8s.io      NetworkPolicy        policy-demo  default-deny-new              Synced     Unknown            
projectcalico.org      GlobalNetworkPolicy               security.quarantine           OutOfSync  Missing            
projectcalico.org      GlobalNetworkSet                  2-tigera-restricted-resource  Synced     Unknown            
projectcalico.org      GlobalNetworkSet                  9-public-ip-range             Synced     Unknown            
projectcalico.org      GlobalThreatFeed                  feodo-tracker                 Synced     Unknown            
projectcalico.org      Tier                              platform                      Synced     Unknown            
projectcalico.org      Tier                              security                      Synced     Unknown            
[centos@ip-172-31-8-215 argocd]$ 

# But the object is indeed applied
[centos@ip-172-31-8-215 argocd]$ kg globalnetworkpolicy                                                                                                                           
NAME                  AGE                                                                                                                                                         
security.quarantine   9m27s                                                                                                                                                       
[centos@ip-172-31-8-215 argocd]$ kg globalnetworkpolicy -o yaml                                                                                                                   
apiVersion: v1                                                                                                                                                                    
items:                                                                                                                                                                            
- apiVersion: crd.projectcalico.org/v1                                                                                                                                            
  kind: GlobalNetworkPolicy                                                                                                                                                       
  metadata:                                                                                                                                                                       
    annotations:
      kubectl.kubernetes.io/last-applied-configuration: |
        {"apiVersion":"projectcalico.org/v3","kind":"GlobalNetworkPolicy","metadata":{"annotations":{},"labels":{"app.kubernetes.io/instance":"secops-demo"},"name":"security.qua$antine"},"spec":{"egress":[{"action":"Log","destination":{},"source":{}},{"action":"Deny","destination":{},"source":{}}],"ingress":[{"action":"Log","destination":{},"source":{}}${"action":"Deny","destination":{},"source":{}}],"order":100,"selector":"quarantine == \"true\"","tier":"security","types":["Ingress","Egress"]}}
      projectcalico.org/metadata: '{"uid":"3d11a51e-f078-11e9-b2c4-46f55cc855d5","creationTimestamp":"2019-10-17T00:51:22Z"}'
    creationTimestamp: "2019-10-17T00:51:22Z"
    deletionGracePeriodSeconds: 0
    deletionTimestamp: "2019-10-17T00:51:27Z"
    finalizers:
    - foregroundDeletion
    generation: 2
    labels:
      app.kubernetes.io/instance: secops-demo
      projectcalico.org/tier: security
    name: security.quarantine
    resourceVersion: "834584"
    selfLink: /apis/crd.projectcalico.org/v1/globalnetworkpolicies/security.quarantine
    uid: 2261334b-8130-4376-9131-3738074b09c4
  spec: .....

Screenshots

image

Version

[centos@ip-172-31-8-215 argocd]$ argocd version
argocd: v1.2.3+de7003f
  BuildDate: 2019-10-01T20:03:10Z
  GitCommit: de7003f530dbe0a8d49e2572b575a333ac7d803c
  GitTreeState: clean
  GoVersion: go1.12.6
  Compiler: gc
  Platform: linux/amd64
argocd-server: v1.2.3+de7003f
  BuildDate: 2019-10-01T20:05:11Z
  GitCommit: de7003f530dbe0a8d49e2572b575a333ac7d803c
  GitTreeState: clean
  GoVersion: go1.12.6
  Compiler: gc
  Platform: linux/amd64
  Ksonnet Version: 0.13.1
[centos@ip-172-31-8-215 argocd]$ 

Logs

time="2019-10-17T00:51:20Z" level=info msg=syncing application=secops-demo isSelectiveSync=false skipHooks=false started=false                                                    
time="2019-10-17T00:51:20Z" level=info msg=tasks application=secops-demo isSelectiveSync=false tasks="[Sync/0 resource projectcalico.org/GlobalNetworkSet:default/2-tigera-restric
ted-resource obj->obj (,,), Sync/0 resource projectcalico.org/GlobalNetworkSet:default/9-public-ip-range obj->obj (,,), Sync/0 resource networking.k8s.io/NetworkPolicy:policy-dem
o/access-nginx obj->obj (,,), Sync/0 resource networking.k8s.io/NetworkPolicy:policy-demo/default-deny-new obj->obj (,,), Sync/0 resource projectcalico.org/GlobalThreatFeed:defau
lt/feodo-tracker obj->obj (,,), Sync/0 resource projectcalico.org/Tier:default/platform obj->obj (,,), Sync/0 resource projectcalico.org/Tier:default/security obj->obj (,,), Sync
/0 resource projectcalico.org/GlobalNetworkPolicy:default/security.quarantine nil->obj (,,)]"                                                                                     
time="2019-10-17T00:51:20Z" level=info msg="Applying resource GlobalNetworkSet/9-public-ip-range in cluster: https://10.96.0.1:443, namespace: default"                           
time="2019-10-17T00:51:20Z" level=info msg="Applying resource GlobalNetworkSet/2-tigera-restricted-resource in cluster: https://10.96.0.1:443, namespace: default"                
time="2019-10-17T00:51:20Z" level=info msg="kubectl --kubeconfig /dev/shm/865799974 -f - apply -n default --dry-run" dir= execID=zDdWr                                            
time="2019-10-17T00:51:20Z" level=info msg="kubectl --kubeconfig /dev/shm/712759373 -f - apply -n default --dry-run" dir= execID=IaVzu                                            
time="2019-10-17T00:51:20Z" level=info msg="Applying resource NetworkPolicy/default-deny-new in cluster: https://10.96.0.1:443, namespace: policy-demo"                           
time="2019-10-17T00:51:20Z" level=info msg="Applying resource NetworkPolicy/access-nginx in cluster: https://10.96.0.1:443, namespace: policy-demo"                               
time="2019-10-17T00:51:20Z" level=info msg="kubectl --kubeconfig /dev/shm/237638984 -f - apply -n policy-demo --dry-run" dir= execID=dKKVT                                        
time="2019-10-17T00:51:20Z" level=info msg="kubectl --kubeconfig /dev/shm/980220423 -f - apply -n policy-demo --dry-run" dir= execID=Q77PL                                        
time="2019-10-17T00:51:20Z" level=info msg="Applying resource GlobalThreatFeed/feodo-tracker in cluster: https://10.96.0.1:443, namespace: default"                               
time="2019-10-17T00:51:20Z" level=info msg="kubectl --kubeconfig /dev/shm/172322746 -f - apply -n default --dry-run" dir= execID=n1JZL                                            
time="2019-10-17T00:51:21Z" level=info msg="Applying resource Tier/security in cluster: https://10.96.0.1:443, namespace: default"                                                
time="2019-10-17T00:51:21Z" level=info msg="kubectl --kubeconfig /dev/shm/436793809 -f - apply -n default --dry-run" dir= execID=dWCJp                                            
time="2019-10-17T00:51:21Z" level=info msg="Applying resource Tier/platform in cluster: https://10.96.0.1:443, namespace: default"                                                
time="2019-10-17T00:51:21Z" level=info msg="kubectl --kubeconfig /dev/shm/704496380 -f - apply -n default --dry-run" dir= execID=qpB6c                                            
time="2019-10-17T00:51:21Z" level=info msg="Applying resource GlobalNetworkPolicy/security.quarantine in cluster: https://10.96.0.1:443, namespace: default"                      
time="2019-10-17T00:51:21Z" level=info msg="kubectl --kubeconfig /dev/shm/224964139 -f - apply -n default --dry-run" dir= execID=E42CL                                            
time="2019-10-17T00:51:21Z" level=info msg="Updating operation state. phase: Running -> Running, message: '' -> 'one or more tasks are running'" application=secops-demo          
time="2019-10-17T00:51:21Z" level=info msg="Applying resource GlobalNetworkSet/9-public-ip-range in cluster: https://10.96.0.1:443, namespace: default"                           
time="2019-10-17T00:51:21Z" level=info msg="Applying resource GlobalNetworkSet/2-tigera-restricted-resource in cluster: https://10.96.0.1:443, namespace: default"                
time="2019-10-17T00:51:21Z" level=info msg="kubectl --kubeconfig /dev/shm/097705358 -f - apply -n default" dir= execID=PGZcT                                                      
time="2019-10-17T00:51:21Z" level=info msg="kubectl --kubeconfig /dev/shm/167332757 -f - apply -n default" dir= execID=r8OVb                                                      
time="2019-10-17T00:51:21Z" level=info msg="adding resource result, status: 'Synced', phase: 'Running', message: 'globalnetworkset.projectcalico.org/2-tigera-restricted-resource 
unchanged'" application=secops-demo kind=GlobalNetworkSet name=2-tigera-restricted-resource namespace=default phase=Sync                                 
time="2019-10-17T00:51:21Z" level=info msg="adding resource result, status: 'Synced', phase: 'Running', message: 'globalnetworkset.projectcalico.org/9-public-ip-range unchanged'"
 application=secops-demo kind=GlobalNetworkSet name=9-public-ip-range namespace=default phase=Sync
time="2019-10-17T00:51:21Z" level=info msg="Applying resource NetworkPolicy/default-deny-new in cluster: https://10.96.0.1:443, namespace: policy-demo"
time="2019-10-17T00:51:21Z" level=info msg="kubectl --kubeconfig /dev/shm/942104048 -f - apply -n policy-demo" dir= execID=Lp0vV
time="2019-10-17T00:51:21Z" level=info msg="Applying resource NetworkPolicy/access-nginx in cluster: https://10.96.0.1:443, namespace: policy-demo"
time="2019-10-17T00:51:21Z" level=info msg="kubectl --kubeconfig /dev/shm/604622991 -f - apply -n policy-demo" dir= execID=6KGVB
time="2019-10-17T00:51:21Z" level=info msg="adding resource result, status: 'Synced', phase: 'Running', message: 'networkpolicy.networking.k8s.io/access-nginx unchanged'" applica
tion=secops-demo kind=NetworkPolicy name=access-nginx namespace=policy-demo phase=Sync
time="2019-10-17T00:51:21Z" level=info msg="Refreshing app status (controller refresh requested), level (1)" application=secops-demo
W1017 00:51:21.783403       1 listers.go:77] can not retrieve list of objects using index : Index with name namespace does not exist
time="2019-10-17T00:51:21Z" level=info msg="Comparing app state (cluster: https://kubernetes.default.svc, namespace: default)" application=secops-demo
time="2019-10-17T00:51:21Z" level=info msg="adding resource result, status: 'Synced', phase: 'Running', message: 'networkpolicy.networking.k8s.io/default-deny-new configured'" ap
plication=secops-demo kind=NetworkPolicy name=default-deny-new namespace=policy-demo phase=Sync
time="2019-10-17T00:51:21Z" level=info msg="Applying resource GlobalThreatFeed/feodo-tracker in cluster: https://10.96.0.1:443, namespace: default"
time="2019-10-17T00:51:21Z" level=info msg="kubectl --kubeconfig /dev/shm/310798498 -f - apply -n default" dir= execID=qtcN5
time="2019-10-17T00:51:21Z" level=info msg="Skipping auto-sync: another operation is in progress" application=secops-demo
time="2019-10-17T00:51:21Z" level=info msg="Update successful" application=secops-demo
time="2019-10-17T00:51:21Z" level=info msg="Reconciliation completed" application=secops-demo dest-namespace=default dest-server="https://kubernetes.default.svc" fields.level=1 t
ime_ms=47.788535
time="2019-10-17T00:51:21Z" level=info msg="Refreshing app status (controller refresh requested), level (1)" application=secops-demo
W1017 00:51:21.831252       1 listers.go:77] can not retrieve list of objects using index : Index with name namespace does not exist
time="2019-10-17T00:51:21Z" level=info msg="Comparing app state (cluster: https://kubernetes.default.svc, namespace: default)" application=secops-demo
time="2019-10-17T00:51:21Z" level=info msg="Skipping auto-sync: another operation is in progress" application=secops-demo
time="2019-10-17T00:51:21Z" level=info msg="Update successful" application=secops-demo
time="2019-10-17T00:51:21Z" level=info msg="Reconciliation completed" application=secops-demo dest-namespace=default dest-server="https://kubernetes.default.svc" fields.level=1 t
ime_ms=23.356382
time="2019-10-17T00:51:21Z" level=info msg="adding resource result, status: 'Synced', phase: 'Running', message: 'globalthreatfeed.projectcalico.org/feodo-tracker unchanged'" app
lication=secops-demo kind=GlobalThreatFeed name=feodo-tracker namespace=default phase=Sync
time="2019-10-17T00:51:21Z" level=info msg="Applying resource Tier/security in cluster: https://10.96.0.1:443, namespace: default"
time="2019-10-17T00:51:21Z" level=info msg="Applying resource Tier/platform in cluster: https://10.96.0.1:443, namespace: default"
time="2019-10-17T00:51:21Z" level=info msg="kubectl --kubeconfig /dev/shm/986551705 -f - apply -n default" dir= execID=Aa4SM
time="2019-10-17T00:51:21Z" level=info msg="kubectl --kubeconfig /dev/shm/194716708 -f - apply -n default" dir= execID=KI5g1
time="2019-10-17T00:51:22Z" level=info msg="adding resource result, status: 'Synced', phase: 'Running', message: 'tier.projectcalico.org/security unchanged'" application=secops-d
emo kind=Tier name=security namespace=default phase=Sync
time="2019-10-17T00:51:22Z" level=info msg="adding resource result, status: 'Synced', phase: 'Running', message: 'tier.projectcalico.org/platform unchanged'" application=secops-d
emo kind=Tier name=platform namespace=default phase=Sync
time="2019-10-17T00:51:22Z" level=info msg="Applying resource GlobalNetworkPolicy/security.quarantine in cluster: https://10.96.0.1:443, namespace: default"
time="2019-10-17T00:51:22Z" level=info msg="kubectl --kubeconfig /dev/shm/933764403 -f - apply -n default" dir= execID=fztA4
time="2019-10-17T00:51:22Z" level=info msg="Refreshing app status (controller refresh requested), level (1)" application=secops-demo
W1017 00:51:22.286660       1 listers.go:77] can not retrieve list of objects using index : Index with name namespace does not exist
time="2019-10-17T00:51:22Z" level=info msg="Comparing app state (cluster: https://kubernetes.default.svc, namespace: default)" application=secops-demo
time="2019-10-17T00:51:22Z" level=info msg="adding resource result, status: 'Synced', phase: 'Running', message: 'globalnetworkpolicy.projectcalico.org/security.quarantine create
d'" application=secops-demo kind=GlobalNetworkPolicy name=security.quarantine namespace=default phase=Sync
time="2019-10-17T00:51:22Z" level=info msg="Updating operation state. phase: Running -> Succeeded, message: 'one or more tasks are running' -> 'successfully synced (all tasks run
)'" application=secops-demo
time="2019-10-17T00:51:22Z" level=info msg="sync/terminate complete" application=secops-demo
time="2019-10-17T00:51:22Z" level=info msg="Skipping auto-sync: another operation is in progress" application=secops-demo
time="2019-10-17T00:51:22Z" level=info msg="Updated sync status: Synced -> OutOfSync" application=secops-demo dest-namespace=default dest-server="https://kubernetes.default.svc"
reason=ResourceUpdated type=Normal
time="2019-10-17T00:51:22Z" level=info msg="updated 'secops-demo' operation (phase: Succeeded)"
time="2019-10-17T00:51:22Z" level=info msg="Sync operation to c5672a7f48ca93e83c30ddbade86ca845b8a2aa3 succeeded" application=secops-demo dest-namespace=default dest-server="http
s://kubernetes.default.svc" reason=OperationCompleted type=Normal
time="2019-10-17T00:51:22Z" level=info msg="Update successful" application=secops-demo
time="2019-10-17T00:51:22Z" level=info msg="Reconciliation completed" application=secops-demo dest-namespace=default dest-server="https://kubernetes.default.svc" fields.level=1 t
ime_ms=33.434976999999996
time="2019-10-17T00:51:22Z" level=info msg="Refreshing app status (controller refresh requested), level (2)" application=secops-demo
W1017 00:51:22.320139       1 listers.go:77] can not retrieve list of objects using index : Index with name namespace does not exist
time="2019-10-17T00:51:22Z" level=info msg="Comparing app state (cluster: https://kubernetes.default.svc, namespace: default)" application=secops-demo
time="2019-10-17T00:51:22Z" level=info msg="Skipping auto-sync: already attempted sync to c5672a7f48ca93e83c30ddbade86ca845b8a2aa3 with timeout 5s (retrying in 4.593005569s)" app
lication=secops-demo
time="2019-10-17T00:51:22Z" level=info msg="Updated sync status: Synced -> OutOfSync" application=secops-demo dest-namespace=default dest-server="https://kubernetes.default.svc"
reason=ResourceUpdated type=Normal
time="2019-10-17T00:51:22Z" level=info msg="Updated health status: Healthy -> Missing" application=secops-demo dest-namespace=default dest-server="https://kubernetes.default.svc"
 reason=ResourceUpdated type=Normal
time="2019-10-17T00:51:22Z" level=info msg="Update successful" application=secops-demo
time="2019-10-17T00:51:22Z" level=info msg="Reconciliation completed" application=secops-demo dest-namespace=default dest-server="https://kubernetes.default.svc" fields.level=2 t
ime_ms=101.873665
time="2019-10-17T00:51:22Z" level=info msg="Refreshing app status (controller refresh requested), level (2)" application=secops-demo
W1017 00:51:22.422072       1 listers.go:77] can not retrieve list of objects using index : Index with name namespace does not exist
time="2019-10-17T00:51:22Z" level=info msg="Comparing app state (cluster: https://kubernetes.default.svc, namespace: default)" application=secops-demo
time="2019-10-17T00:51:22Z" level=info msg="Skipping auto-sync: already attempted sync to c5672a7f48ca93e83c30ddbade86ca845b8a2aa3 with timeout 5s (retrying in 4.491488025s)" app
lication=secops-demo
time="2019-10-17T00:51:22Z" level=info msg="Update successful" application=secops-demo
time="2019-10-17T00:51:22Z" level=info msg="Reconciliation completed" application=secops-demo dest-namespace=default dest-server="https://kubernetes.default.svc" fields.level=2 t
ime_ms=96.148145
time="2019-10-17T00:51:22Z" level=info msg="Refreshing app status (controller refresh requested), level (2)" application=secops-demo
W1017 00:51:22.518568       1 listers.go:77] can not retrieve list of objects using index : Index with name namespace does not exist
time="2019-10-17T00:51:22Z" level=info msg="Comparing app state (cluster: https://kubernetes.default.svc, namespace: default)" application=secops-demo
time="2019-10-17T00:51:22Z" level=info msg="Skipping auto-sync: already attempted sync to c5672a7f48ca93e83c30ddbade86ca845b8a2aa3 with timeout 5s (retrying in 4.399762357s)" app
lication=secops-demo
time="2019-10-17T00:51:22Z" level=info msg="Update successful" application=secops-demo
time="2019-10-17T00:51:22Z" level=info msg="Reconciliation completed" application=secops-demo dest-namespace=default dest-server="https://kubernetes.default.svc" fields.level=2 t
ime_ms=91.676858
time="2019-10-17T00:51:22Z" level=info msg="Refreshing app status (controller refresh requested), level (2)" application=secops-demo
W1017 00:51:22.610265       1 listers.go:77] can not retrieve list of objects using index : Index with name namespace does not exist
time="2019-10-17T00:51:22Z" level=info msg="Comparing app state (cluster: https://kubernetes.default.svc, namespace: default)" application=secops-demo
time="2019-10-17T00:51:22Z" level=info msg="Update successful" application=secops-demo
time="2019-10-17T00:51:22Z" level=info msg="Reconciliation completed" application=secops-demo dest-namespace=default dest-server="https://kubernetes.default.svc" fields.level=2 t
ime_ms=93.865724

Relevant slack discussion: https://argoproj.slack.com/archives/CASHNF6MS/p1571058412129700

jannfis commented 4 years ago

Hm, I cannot reproduce it on the environment of mine which has Calico installed (it is not ArgoCD v1.2.3 but 1.3-rc1 tho).

I have not synced your repository (because somehow, for me the API group is crd.projectcalico.org/v1 instead of projectcalico.org/v3), but I took your GlobalNetworkPolicy and the two NetworkPolicy resources into an own repository, created an app with auto sync and pruning enabled. The following is the result:

image

Target:             HEAD
Path:               test-app
Sync Policy:        Automated (Prune)
Sync Status:        Synced to HEAD (fb84f13)
Health Status:      Healthy

GROUP                  KIND                 NAMESPACE       NAME                 STATUS   HEALTH  HOOK  MESSAGE
networking.k8s.io      NetworkPolicy        policy-demo     default-deny-new     Synced                 networkpolicy.networking.k8s.io/default-deny-new created
networking.k8s.io      NetworkPolicy        policy-demo     access-nginx         Synced                 networkpolicy.networking.k8s.io/access-nginx created
crd.projectcalico.org  GlobalNetworkPolicy  test-namespace  security.quarantine  Running  Synced        globalnetworkpolicy.crd.projectcalico.org/security.quarantine configured
crd.projectcalico.org  GlobalNetworkPolicy                  security.quarantine  Synced                 

What I was wondering about, in your screenshot there are two resources named security.quarantine of type GlobalNetworkPolicy with two different states: The first has progressing health and out of sync state, the second has missing health and out of sync state. But I don't know if that's a false lead.

However, do you have the possibility to try out an argocd:latest image (along with updated ArgoCD CRDs from the master branch repository) for your use-case? Maybe the bug has been fixed meanwhile.

bikramgupta commented 4 years ago

Thank you.

I tried with the latest version, but same error. Please see below. BTW, are you running Calico in your cluster? If not, please apply [this manifest](kubectl apply -f https://docs.projectcalico.org/v3.10/manifests/calico.yaml).

+++ [centos@ip-172-31-8-215 ~]$ argocd version argocd: v1.3.0-rc1+8a43840 BuildDate: 2019-10-16T21:44:49Z GitCommit: 8a43840f0baef50946d3eadc1d52d6a2abc162d5 GitTreeState: clean GoVersion: go1.12.6 Compiler: gc Platform: linux/amd64 argocd-server: v1.3.0-rc1+8a43840 BuildDate: 2019-10-16T21:46:08Z GitCommit: 8a43840f0baef50946d3eadc1d52d6a2abc162d5 GitTreeState: clean GoVersion: go1.12.6 Compiler: gc Platform: linux/amd64 Ksonnet Version: v0.13.1 Kustomize Version: v3.1.0 Helm Version: v2.12.1 Kubectl Version: v1.14.0 [centos@ip-172-31-8-215 ~]$

[centos@ip-172-31-8-215 ~]$ argocd app get secops
Name: secops
Project: default
Server: https://kubernetes.default.svc
Namespace: default
URL: https://10.111.223.188/applications/secops
Repo: https://github.com/bikram20/k8sconfig.git
Target: HEAD
Path: secops
SyncWindow: Sync Allowed
Sync Policy: Automated (Prune)
Sync Status: OutOfSync from HEAD (2f177ab)
Health Status: Missing

CONDITION MESSAGE
SyncError Failed sync attempt to 2f177ab124baa0d030f189eb8190be38f8503ed6: one or more objects failed to apply

GROUP KIND NAMESPACE NAME STATUS HEALTH HOOK MESSAGE
crd.projectcalico.org GlobalNetworkPolicy security.quarantine OutOfSync Progressing ignored (requires pruning) projectcalico.org GlobalNetworkPolicy default.whitelist-intra-cluster-in Succeeded PruneSkipped ignored (requires pruning) crd.projectcalico.org GlobalNetworkPolicy default.whitelist-intra-cluster-out OutOfSync ignored (requires pruning) projectcalico.org GlobalNetworkSet default 2-tigera-restricted-resource Running Synced globalnetworkset.$ rojectcalico.org/2-tigera-restricted-resource unchanged projectcalico.org GlobalNetworkSet default 9-public-ip-range Running Synced globalnetworkset.$ rojectcalico.org/9-public-ip-range unchanged networking.k8s.io NetworkPolicy policy-demo access-nginx Synced networkpolicy.net$ orking.k8s.io/access-nginx unchanged networking.k8s.io NetworkPolicy policy-demo default-deny-new Synced networkpolicy.net$ orking.k8s.io/default-deny-new configured projectcalico.org GlobalThreatFeed default feodo-tracker Running Synced globalthreatfeed.$ rojectcalico.org/feodo-tracker unchanged projectcalico.org Tier default security Running Synced tier.projectcalic$ .org/security unchanged projectcalico.org Tier default platform Running Synced tier.projectcalic$ .org/platform unchanged projectcalico.org GlobalNetworkPolicy default whitelist-intra-cluster-in Failed SyncFailed kubectl failed ex$ t status 1: error when applying patch:

to:
Resource: "projectcalico.org/v3, Resource=globalnetworkpolicies", GroupVersionKind: "projectcalico.org/v3, Kind=GlobalNetworkPolicy"
Name: "whitelist-intra-cluster-in", Namespace: ""
Object: &{map["apiVersion":"projectcalico.org/v3" "kind":"GlobalNetworkPolicy" "metadata":map["annotations":map["kubectl.kubernetes.io/last-app lied-configuration":"{\"apiVersion\":\"projectcalico.org/v3\",\"kind\":\"GlobalNetworkPolicy\",\"metadata\":{\"annotations\":{},\"labels\":{\"$ pp.kubernetes.io/instance\":\"secops\"},\"name\":\"whitelist-intra-cluster-in\"},\"spec\":{\"ingress\":[{\"action\":\"Allow\"}],\"order\":20,\$ selector\":\"tigera.io/type == \\"master\\"\"}}\n"] "creationTimestamp":"2019-10-20T17:10:18Z" "labels":map["app.kubernetes.io/instance":"se$ ops" "projectcalico.org/tier":"default"] "name":"default.whitelist-intra-cluster-in" "resourceVersion":"1271879" "selfLink":"/apis/projectcali$ o.org/v3/globalnetworkpolicies/default.whitelist-intra-cluster-in" "uid":"4f0d9e52-437f-4eff-a321-b17a8a966ff1"] "spec":map["ingress":[map["ac$ ion":"Allow" "destination":map[] "source":map[]]] "order":'\x14' "selector":"tigera.io/type == \"master\"" "tier":"default" "types":["Ingress"$ ]]} for: "STDIN": error when creating patch with: original: {"apiVersion":"projectcalico.org/v3","kind":"GlobalNetworkPolicy","metadata":{"annotations":{},"labels":{"app.kubernetes.io/instance":"secops"$ ,"name":"whitelist-intra-cluster-in"},"spec":{"ingress":[{"action":"Allow"}],"order":20,"selector":"tigera.io/type == \"master\""}}

modified: {"apiVersion":"projectcalico.org/v3","kind":"GlobalNetworkPolicy","metadata":{"annotations":{"kubectl.kubernetes.io/last-applied-configuration$ :"{\"apiVersion\":\"projectcalico.org/v3\",\"kind\":\"GlobalNetworkPolicy\",\"metadata\":{\"annotations\":{},\"labels\":{\"app.kubernetes.io/i$ stance\":\"secops\"},\"name\":\"whitelist-intra-cluster-in\"},\"spec\":{\"ingress\":[{\"action\":\"Allow\"}],\"order\":20,\"selector\":\"tiger$ .io/type == \\"master\\"\"}}\n"},"labels":{"app.kubernetes.io/instance":"secops"},"name":"whitelist-intra-cluster-in"},"spec":{"ingress":[{"$ ction":"Allow"}],"order":20,"selector":"tigera.io/type == \"master\""}}

current: {"apiVersion":"projectcalico.org/v3","kind":"GlobalNetworkPolicy","metadata":{"annotations":{"kubectl.kubernetes.io/last-applied-configuration$ :"{\"apiVersion\":\"projectcalico.org/v3\",\"kind\":\"GlobalNetworkPolicy\",\"metadata\":{\"annotations\":{},\"labels\":{\"app.kubernetes.io/i$ stance\":\"secops\"},\"name\":\"whitelist-intra-cluster-in\"},\"spec\":{\"ingress\":[{\"action\":\"Allow\"}],\"order\":20,\"selector\":\"tigera .io/type == \\"master\\"\"}}\n"},"creationTimestamp":"2019-10-20T17:10:18Z","labels":{"app.kubernetes.io/instance":"secops","projectcalico.or g/tier":"default"},"name":"default.whitelist-intra-cluster-in","resourceVersion":"1271879","selfLink":"/apis/projectcalico.org/v3/globalnetwork policies/default.whitelist-intra-cluster-in","uid":"4f0d9e52-437f-4eff-a321-b17a8a966ff1"},"spec":{"ingress":[{"action":"Allow","destination":{ },"source":{}}],"order":20,"selector":"tigera.io/type == \"master\"","tier":"default","types":["Ingress"]}}

for: "STDIN": precondition failed for: map[metadata:map[name:whitelist-intra-cluster-in] spec:map[ingress:[map[action:Allow]]]] projectcalico.org GlobalNetworkPolicy default security.quarantine Running Synced globalnetworkpolicy.projectcalico.org/secu rity.quarantine unchanged projectcalico.org GlobalNetworkPolicy default whitelist-intra-cluster-out Failed SyncFailed kubectl failed exit status 1: error when a pplying patch:

to: Resource: "projectcalico.org/v3, Resource=globalnetworkpolicies", GroupVersionKind: "projectcalico.org/v3, Kind=GlobalNetworkPolicy" Name: "whitelist-intra-cluster-out", Namespace: "" Object: &{map["apiVersion":"projectcalico.org/v3" "kind":"GlobalNetworkPolicy" "metadata":map["annotations":map["kubectl.kubernetes.io/last-app lied-configuration":"{\"apiVersion\":\"projectcalico.org/v3\",\"kind\":\"GlobalNetworkPolicy\",\"metadata\":{\"annotations\":{},\"labels\":{\"a pp.kubernetes.io/instance\":\"secops\"},\"name\":\"whitelist-intra-cluster-out\"},\"spec\":{\"egress\":[{\"action\":\"Allow\"}],\"order\":20,\" selector\":\"tigera.io/type == \\"master\\"\"}}\n"] "creationTimestamp":"2019-10-20T17:10:18Z" "deletionGracePeriodSeconds":'\x00' "deletionT imestamp":"2019-10-20T17:10:23Z" "finalizers":["foregroundDeletion"] "labels":map["app.kubernetes.io/instance":"secops" "projectcalico.org/tier ":"default"] "name":"default.whitelist-intra-cluster-out" "resourceVersion":"1271940" "selfLink":"/apis/projectcalico.org/v3/globalnetworkpolic ies/default.whitelist-intra-cluster-out" "uid":"a33437e3-43c2-43f1-8a79-f2d24ea75bc7"] "spec":map["egress":[map["action":"Allow" "destination": map[] "source":map[]]] "order":'\x14' "selector":"tigera.io/type == \"master\"" "tier":"default" "types":["Egress"]]]} for: "STDIN": error when creating patch with: original: {"apiVersion":"projectcalico.org/v3","kind":"GlobalNetworkPolicy","metadata":{"annotations":{},"labels":{"app.kubernetes.io/instance":"secops"} ,"name":"whitelist-intra-cluster-out"},"spec":{"egress":[{"action":"Allow"}],"order":20,"selector":"tigera.io/type == \"master\""}}

modified: {"apiVersion":"projectcalico.org/v3","kind":"GlobalNetworkPolicy","metadata":{"annotations":{"kubectl.kubernetes.io/last-applied-configuration" :"{\"apiVersion\":\"projectcalico.org/v3\",\"kind\":\"GlobalNetworkPolicy\",\"metadata\":{\"annotations\":{},\"labels\":{\"app.kubernetes.io/in stance\":\"secops\"},\"name\":\"whitelist-intra-cluster-out\"},\"spec\":{\"egress\":[{\"action\":\"Allow\"}],\"order\":20,\"selector\":\"tigera .io/type == \\"master\\"\"}}\n"},"labels":{"app.kubernetes.io/instance":"secops"},"name":"whitelist-intra-cluster-out"},"spec":{"egress":[{"a ction":"Allow"}],"order":20,"selector":"tigera.io/type == \"master\""}}

current: {"apiVersion":"projectcalico.org/v3","kind":"GlobalNetworkPolicy","metadata":{"annotations":{"kubectl.kubernetes.io/last-applied-configuration" :"{\"apiVersion\":\"projectcalico.org/v3\",\"kind\":\"GlobalNetworkPolicy\",\"metadata\":{\"annotations\":{},\"labels\":{\"app.kubernetes.io/in stance\":\"secops\"},\"name\":\"whitelist-intra-cluster-out\"},\"spec\":{\"egress\":[{\"action\":\"Allow\"}],\"order\":20,\"selector\":\"tigera .io/type == \\"master\\"\"}}\n"},"creationTimestamp":"2019-10-20T17:10:18Z","deletionGracePeriodSeconds":0,"deletionTimestamp":"2019-10-20T17 :10:23Z","finalizers":["foregroundDeletion"],"labels":{"app.kubernetes.io/instance":"secops","projectcalico.org/tier":"default"},"name":"defaul t.whitelist-intra-cluster-out","resourceVersion":"1271940","selfLink":"/apis/projectcalico.org/v3/globalnetworkpolicies/default.whitelist-intra -cluster-out","uid":"a33437e3-43c2-43f1-8a79-f2d24ea75bc7"},"spec":{"egress":[{"action":"Allow","destination":{},"source":{}}],"order":20,"sele ctor":"tigera.io/type == \"master\"","tier":"default","types":["Egress"]}}

for: "STDIN": precondition failed for: map[metadata:map[name:whitelist-intra-cluster-out] spec:map[egress:[map[action:Allow]]]] crd.projectcalico.org GlobalNetworkPolicy default.whitelist-intra-cluster-in OutOfSync projectcalico.org GlobalNetworkPolicy security.quarantine OutOfSync Missing projectcalico.org GlobalNetworkPolicy whitelist-intra-cluster-in OutOfSync Missing projectcalico.org GlobalNetworkPolicy whitelist-intra-cluster-out OutOfSync Missing projectcalico.org GlobalNetworkSet 2-tigera-restricted-resource Synced projectcalico.org GlobalNetworkSet 9-public-ip-range Synced projectcalico.org GlobalThreatFeed feodo-tracker Synced projectcalico.org Tier platform Synced projectcalico.org Tier security Synced [centos@ip-172-31-8-215 ~]$

BUT THE MANIFEST DOES GET APPLIED

[centos@ip-172-31-8-215 ~]$ kg globalnetworkpolicy
NAME AGE
default.whitelist-intra-cluster-in 7m48s
default.whitelist-intra-cluster-out 7m48s
security.quarantine 24m
[centos@ip-172-31-8-215 ~]$ kg globalnetworkpolicy security.quarantine -o yaml
apiVersion: crd.projectcalico.org/v1
kind: GlobalNetworkPolicy
metadata:
annotations:
kubectl.kubernetes.io/last-applied-configuration: |
{"apiVersion":"projectcalico.org/v3","kind":"GlobalNetworkPolicy","metadata":{"annotations":{},"labels":{"app.kubernetes.io/instance":"se cops"},"name":"security.quarantine"},"spec":{"egress":[{"action":"Log","destination":{},"source":{}},{"action":"Deny","destination":{},"source" :{}}],"ingress":[{"action":"Log","destination":{},"source":{}},{"action":"Deny","destination":{},"source":{}}],"order":100,"selector":"quaranti ne == \"true\"","tier":"security","types":["Ingress","Egress"]}}
projectcalico.org/metadata: '{"uid":"25f37939-f35a-11e9-b2c4-46f55cc855d5","creationTimestamp":"2019-10-20T16:53:32Z"}'
creationTimestamp: "2019-10-20T16:53:32Z"
deletionGracePeriodSeconds: 0
deletionTimestamp: "2019-10-20T17:05:54Z" finalizers:

+++++

jannfis commented 4 years ago

I'm not very familiar with advanced features of Calico, but after some more research, it turns out that Calico specific policies such as GlobalNetworkPolicy resources should not be applied using kubectl, instead they must be applied using calicoctl according to the Calico documentation it states:

Calico network policies and Calico global network policies are applied using calicoctl. Syntax is similar to Kubernetes, but there a few differences. For help, see calicoctl user reference.

That might be the core problem that you are facing (and why I had to change API version in my copy of the manifests to apply them using kubectl, and most likely those didn't work as expected anyway).

jannfis commented 4 years ago

To illustrate the problem some more, I took a very simple example GlobalNetworkPolicy from calico.org's web site:

apiVersion: projectcalico.org/v3
kind: GlobalNetworkPolicy
metadata:
  name: allow-tcp-6379
spec:
  selector: role == 'database'
  types:
  - Ingress
  - Egress
  ingress:
  - action: Allow
    protocol: TCP
    source:
      selector: role == 'frontend'
    destination:
      ports:
      - 6379
  egress:
  - action: Allow

Try to apply it using kubectl:

$ kubectl apply -f gnp2.yaml 
error: unable to recognize "gnp2.yaml": no matches for kind "GlobalNetworkPolicy" in version "projectcalico.org/v3"

Using calicoctl (success):

$ calicoctl apply -f gnp2.yaml 
Successfully applied 1 'GlobalNetworkPolicy' resource(s)

But the resulting resource differs from the manifest (most prominent in apiVersion and also the resource name which got prefixed by default.):

$ kubectl get globalnetworkpolicies.crd.projectcalico.org default.allow-tcp-6379 -o yaml
apiVersion: crd.projectcalico.org/v1
kind: GlobalNetworkPolicy
metadata:
  annotations:
    projectcalico.org/metadata: '{"uid":"d22d51e4-1f93-4c0e-8c6f-d53f76d228e4","creationTimestamp":"2019-10-21T08:30:51Z"}'
  creationTimestamp: "2019-10-21T08:30:51Z"
  generation: 1
  name: default.allow-tcp-6379
  resourceVersion: "1781621"
  selfLink: /apis/crd.projectcalico.org/v1/globalnetworkpolicies/default.allow-tcp-6379
  uid: d22d51e4-1f93-4c0e-8c6f-d53f76d228e4
spec:
  egress:
  - action: Allow
    destination: {}
    source: {}
  ingress:
  - action: Allow
    destination:
      ports:
      - 6379
    protocol: TCP
    source:
      selector: role == 'frontend'
  selector: role == 'database'
  types:
  - Ingress
  - Egress

So, there's most likely no way to manage these resources using ArgoCD, currently.

bikramgupta commented 4 years ago

Thank you very much for going through the product details. I should have clarified that it is an enterprise version of Calico and the kubectl resources are described here.

These resources can be managed using kubectl, because of the CRDs. Please see below.

[centos@ip-172-31-12-62 stage0]$ more security.quarantine.yaml
apiVersion: projectcalico.org/v3
kind: GlobalNetworkPolicy metadata: name: security.quarantine spec: tier: security order: 100 selector: quarantine == "true" ingress:

+++++++++++++++++++++++++++++++

On a separate note, the following resource (GlobalNetworkSets) works perfectly fine with ArgoCD.

[centos@ip-172-31-12-62 stage0]$ kubectl get crd globalnetworksets.crd.projectcalico.org -o yaml
apiVersion: apiextensions.k8s.io/v1beta1
kind: CustomResourceDefinition
metadata:
creationTimestamp: "2019-10-04T03:38:10Z"
generation: 1
name: globalnetworksets.crd.projectcalico.org
resourceVersion: "660"
selfLink: /apis/apiextensions.k8s.io/v1beta1/customresourcedefinitions/globalnetworksets.crd.projectcalico.org
uid: e726c0fa-cff4-4938-82cc-4dfa87500f70
spec:
conversion:
strategy: None
group: crd.projectcalico.org
names:
kind: GlobalNetworkSet
listKind: GlobalNetworkSetList
plural: globalnetworksets
singular: globalnetworkset
preserveUnknownFields: true
scope: Cluster
version: v1
versions:

+++++++++++++++++++++++++++++

Please let me know if you need access to this version of Calico. I can provide the access same day if needed.

jannfis commented 4 years ago

OK, thanks for the clarification about Calico OSS vs. Calico Enterprise. I was beginning to doubt my sanity already ;-)

What I suspect is that if you feed Calico "native" resources via kubectl to the cluster, some Calico component does not only create an additional component (in the crd.projectcalico.org/v1 API group), but also modifies the resource that was initially fed into the cluster.

From your initial report, these are three resources in your application:

crd.projectcalico.org  GlobalNetworkPolicy               security.quarantine           OutOfSync  Progressing        pruned
projectcalico.org      GlobalNetworkPolicy  default      security.quarantine           Running    Synced             globalnetworkpolicy.projectcalico.org/security.quarantine unchanged
projectcalico.org      GlobalNetworkPolicy               security.quarantine           OutOfSync  Missing            

From this, only the second one actually is part of your repo, and it was created namespaced by ArgoCD (although it should not be namespaced, maybe a bug in the CRD?)

Again, I'm only wildly guessing here, but Calico seems to create the third resource from the second one, in a non-namespaced manner and finally the first as a child of the third.

ArgoCD does not have manifests for these two additional resources, but I suspect that Calico copied the app.kubernetes.io/instance labels from the resource originally created by ArgoCD, but does not add any ownerReference to the new resources. So ArgoCD treats these resources as part of the application and thus marks them OutOfSync and Missing.

You could try to use the ignoreExtraneous annotation on your GlobalNetworkPolicy resources as described in the documentation, as I think this annotation will be copied over

bikramgupta commented 4 years ago

Thanks! I tried, but it did not work. I verified by checking the annotations in the running system. Annotations did copy over. But same error.

I do have 2 questions. 1/ Is there any way for me to sync a set of manifests without specifying a destination namespace? Right now, I am putting the destination namespace as "default" which is not correct. I would like to try that if an option. 2/ Any option to start the argocd-application-controller container in debug mode, and where are those logs. I think if we can see a bit more on what's the controller doing after syncing with Git, it will give us some clue.

jannfis commented 4 years ago

1/ Is there any way for me to sync a set of manifests without specifying a destination namespace? Right now, I am putting the destination namespace as "default" which is not correct. I would like to try that if an option.

No, an ArgoCD application is always bound to a namespace. However, if your resources are non-namespaced, they usually will not end up in the defined namespace. ClusterRole resources are such an example, and also GlobalNetworkPolicy should be non-namespaced. From my cluster:

$ kubectl api-resources --api-group=crd.projectcalico.org
NAME                    SHORTNAMES   APIGROUP                NAMESPACED   KIND
bgpconfigurations                    crd.projectcalico.org   false        BGPConfiguration
bgppeers                             crd.projectcalico.org   false        BGPPeer
blockaffinities                      crd.projectcalico.org   false        BlockAffinity
clusterinformations                  crd.projectcalico.org   false        ClusterInformation
felixconfigurations                  crd.projectcalico.org   false        FelixConfiguration
globalnetworkpolicies                crd.projectcalico.org   false        GlobalNetworkPolicy
globalnetworksets                    crd.projectcalico.org   false        GlobalNetworkSet
hostendpoints                        crd.projectcalico.org   false        HostEndpoint
ipamblocks                           crd.projectcalico.org   false        IPAMBlock
ipamconfigs                          crd.projectcalico.org   false        IPAMConfig
ipamhandles                          crd.projectcalico.org   false        IPAMHandle
ippools                              crd.projectcalico.org   false        IPPool
networkpolicies                      crd.projectcalico.org   true         NetworkPolicy
networksets                          crd.projectcalico.org   true         NetworkSet

Is GlobalNetworkPolicy from the projectcalico.org API group (the one introduced with the enterprise version) namespaced resource or not?

2/ Any option to start the argocd-application-controller container in debug mode, and where are those logs. I think if we can see a bit more on what's the controller doing after syncing with Git, it will give us some clue.

Yes, start with a --loglevel DEBUG argument from your Deployment. The logs can be fetched using kubectl logs <your application controller pod> or from whatever log collection tooling you are using.

ymmt2005 commented 4 years ago

@bikramgupta Long story short, apiVersion: projectcalico.org/v3 does not exist in the real world.

Take a look at libcalico-go/lib/api/v3, the package declares its apiVersion as follows:

    // API group details for the Calico v3 API.
    Group               = "projectcalico.org"
    VersionCurrent      = "v3"
    GroupVersionCurrent = Group + "/" + VersionCurrent

Unfortunately, these are not used. The same package also has another variable:

var SchemeGroupVersion = schema.GroupVersion{Group: "crd.projectcalico.org", Version: "v3"}

(note that it has crd.) but again, this is not used too.

The true apiVersion is deeply embedded in lib/backend/k8s as follows: https://github.com/projectcalico/libcalico-go/blob/3d38c58337f2fe135e0561e0279fecf87559c201/lib/backend/k8s/k8s.go#L377-L380

    cfg.GroupVersion = &schema.GroupVersion{
        Group:   "crd.projectcalico.org",
        Version: "v1",
    }

So, Calico supports crd.projectcalico.org/v1 apiGroup only.

bikramgupta commented 4 years ago

@ymmt2005 You are amazing. Sorry for not getting back earlier. I somehow forgot to see the notification. It worked like a charm. This issue can be closed now.

image

Thank you very for for spending time and going to the root of it. Really appreciate your help.

fasaxc commented 1 year ago

Old thread but this isn't quite true (at least not any more!):

Long story short, apiVersion: projectcalico.org/v3 does not exist in the real world.

Calico has a Kubernetes Aggregated API server that exposes the projectcalico.org/v3 resources. They are "real" and they are editable with kubectl as long as you have the AAPI server installed. The backing store for those resources is the CRDs and the CRDs mostly look exactly the same as the "real" resources but the API server is in charge of correctly defaulting fields and validating the data before writing it as a custom resource. If you write directly to the CRDs you're bypassing that protection logic and it's easy to break things. Applying network policies is reasonably safe (but if you apply an invalid policy it will get dropped) but please don't edit IP pools and IPAM tracking data, those are very much internal.

The enterprise version of Calico has always had the API server but we upstreamed it to open source a while ago.

imranismail commented 5 months ago

TLDR: ArgoCD attempts to create a namespaced GlobalNetworkPolicy on the v3 endpoint, but Calico returns a non-namespaced GlobalNetworkPolicy, causing a continuous difference. A potential solution could be to use the v1 endpoint, but this would bypass any validation, defaults, or other logic implemented in the v3 endpoint.

The problem arises from a mismatch between the Custom Resource Definitions (CRDs) in Calico's v1 API and what Calico expects in its v3 API. Calico uses the v1 API for storage, with CRDs indicating whether a resource is namespaced. However, Calico expects resources to be created using the v3 API.

When ArgoCD renders a v3 resource, it defaults to treating it as namespaced if it can't determine the resource's nature. This leads to the application's namespace being added to all v3 resources, including GlobalNetworkPolicy.

During the diffing process, ArgoCD becomes confused and perceives the app as perpetually out of sync. This is because when the GlobalNetworkPolicy is submitted to the Calico API, Calico removes the added namespace and stores it in the v1 CRD. The v3 API then returns a non-namespaced resource when queried.

toralf-hlag commented 2 months ago

This issue affected us today here too.

Is it a bug or a behaviour?

crenshaw-dev commented 2 months ago

When ArgoCD renders a v3 resource, it defaults to treating it as namespaced if it can't determine the resource's nature. This leads to the application's namespace being added to all v3 resources, including GlobalNetworkPolicy.

Possibly related: https://github.com/argoproj/gitops-engine/pull/597