fluxcd / flux

Successor: https://github.com/fluxcd/flux2
https://fluxcd.io
Apache License 2.0
6.9k stars 1.08k forks source link

Flux Fails to Sync while Listing Resources on Tanzu Cluster #3494

Closed jonathan-innis closed 3 years ago

jonathan-innis commented 3 years ago

Describe the bug

Flux fails to sync any resources on a TKG cluster with NetworkPolicyStats disabled. It seems that flux is performing a LIST on resources on the cluster to determine resources that it can apply here.

This fails for Tanzu clusters that have policy status disabled because listing on NetworkPolicyStats when the feature is disabled returns an error.

Because the error that's returned is not a forbidden error, the sync fails in flux at the line here

To Reproduce

Steps to reproduce the behaviour:

  1. Onboard flux to a TKG cluster
  2. Ensure that NetworkPolicyStats is disabled
  3. Ensure Flux service account has cluster-wide access
  4. Attempt to deploy resources from any git repository

Expected behavior

Flux should successfully sync with the repository

Logs

caller=loop.go:134 component=sync-loop event=refreshed url=https://@github.com/Azure/arc-helm-demo.git branch=master HEAD=78444c402215ca59458b9bec3202b02df204ff4b 
ts=2021-06-23T14:43:48.850020785Z caller=sync.go:61 component=daemon info="trying to sync git changes to the cluster" old= new=78444c402215ca59458b9bec3202b02df204ff4b
ts=2021-06-23T14:43:49.475604067Z caller=loop.go:108 component=sync-loop err="collating resources in cluster for sync: feature NetworkPolicyStats disabled

Additional context Tanzu cluster was created using this documentation

stealthybox commented 3 years ago

Because the error that's returned is not a forbidden error, the sync fails in flux at the line here

We should consider what it would mean to relax these errors. We are already continuing if resources are forbidden.

https://github.com/antrea-io/antrea/issues/2214 shows that this is being marked as a BadRequest. There's not much we can do about the apiserver telling us our kubernetes client created a BadRequest. Perhaps we should just log a warning and move on? Unfortunately, the warning will be quite noisy. Maybe debug is more appropriate?

It looks like antrea does plan to fix this for the List verb at some point, but it may be valuable to implement and release a simple patch if Flux v1 is still widely deployed on TKG clusters.

jonathan-innis commented 3 years ago

This should be fixed by #2386 from the Antrea repo. Validated it on our end