argoproj / argo-cd

Declarative Continuous Deployment for Kubernetes
https://argo-cd.readthedocs.io
Apache License 2.0
16.33k stars 4.93k forks source link

fix: status.sync.comparedTo should use replace patch strategy #18061

Closed alexmt closed 2 weeks ago

alexmt commented 2 weeks ago

Closes https://github.com/argoproj/argo-cd/issues/15126

PR adds patchStrategy:"replace" tag to the status.sync.comparedTo field to avoid unable to find api field in struct error during calculating patch.

jessesuen commented 2 weeks ago

Can we try the tests in the other PR? To see what differences there are?

https://github.com/argoproj/argo-cd/pull/16602/files#diff-4dec00983e470c56b48582b67b8ef5c1baf5216e202ec29fdff3233fa4aec501R1937-R2045

I understand this PR does a replace vs. the other does json merge, and so there may be expected differences, but I would like the data point.

The other one also has added e2e tests which may be worthwhile.

crenshaw-dev commented 2 weeks ago

One important question is: will Kustomize partial patching still work, since that's afaik the main (only?) use case for valuesObject. It looks like maybe not, because the openapi spec will be modified to include the replace patch strategy - I expect that Kustomize probably respects that.

crenshaw-dev commented 2 weeks ago

But maybe that doesn't matter, if the Kustomize user doesn't set openapi - maybe it defaults to a plain old JSON patch?

jessesuen commented 2 weeks ago

I expect that Kustomize probably respects that. if the Kustomize user doesn't set openapi - maybe it defaults to a plain old JSON patch?

Yes, I think you answered your own question. Unless a user references an Argo CD Application OpenAPI spec in their kustomization.yaml, then kustomize offline patching should default to json merge (since it wouldn't even know how to perform strategic merge on this non-native type).

Since I believe no one references an Argo CD application openapi spec in their kustomization.yaml (it doesn't appear to me that we even generate this file for this to be possible), users won't notice any difference.

crenshaw-dev commented 2 weeks ago

I think some folks might reference it, we do generate the schema: https://github.com/argoproj/argo-schema-generator/tree/main/schema

jessesuen commented 2 weeks ago

we do generate the schema

Ah, I forgot it lived in a different repo. Thanks. I didn't see it auto-generated in this PR so I incorrectly assumed it was not generated at all.

I can see in https://github.com/argoproj/argo-schema-generator/blob/main/schema/argo_cd_kustomize_schema.json#L1083-L1091, values has a replace strategy, and valuesObject does not.

                "values": {
                    "description": "Values specifies Helm values to be passed to helm template, typically defined as a block. ValuesObject takes precedence over Values, so use one or the other.",
                    "type": "string",
                    "x-kubernetes-patch-strategy": "replace"
                },
                "valuesObject": {
                    "description": "ValuesObject specifies Helm values to be passed to helm template, defined as a map. This takes precedence over Values.",
                    "$ref": "#/definitions/io.k8s.apimachinery.pkg.runtime.RawExtension"
                },

So after this PR, x-kubernetes-patch-strategy would be present in valuesObject.

@alexmt had pointed this out as an inconsistency between the two fields that would make sense to fix.

My thoughts on this:

jessesuen commented 2 weeks ago

Thinking about this some more, I think valuesObject behaving like SMP vs. json merge patch is completely moot.

SMP matters most when there are merge keys in lists. But since valuesObject is a rawextension, there's no way that users would be able to define merge keys without redefining argo_cd_kustomize_schema.json with some predefined structure for valuesObject. But it will never be predefined because it's meant to be used as a freeform values.yaml to helm. So I stand by my assumption that kustomize users will be just fine.

crenshaw-dev commented 2 weeks ago

@alexmt had pointed this out as an inconsistency between the two fields that would make sense to fix.

The problem is that the inconsistency is the point of the field. valuesObject was made an object instead of a string specifically so that folks could partially patch it using Kustomize. It shouldn't behave the same as the values string.

I could be wrong about this, but have a feeling, not many kustomize users reference argo_cd_kustomize_schema.json

I think this is probably true, at least partially because we don't have many (any?) merge keys in the spec. But there are issues requesting merge keys for things like names of sources in multi-source apps, so using the openapi schema in Kustomize may become more popular.

And if they really need the previous behavior, could always remove x-kubernetes-patch-strategy.

True, I think that's a viable workaround for Kustomize users.

I think valuesObject behaving like SMP vs. json merge patch is completely moot.

I don't think the target state of the other PR is SMP support, but rather just working JMP support. If I understand this PR correctly, it could take certain users back to "simple replace" instead of either SMP or JMP.

imo using replace is fine for the controller as long as we 1) inform users about the new API behavior, in case they're patching the CR on-cluster 2) document the difference between Kustomize behavior with openapi and without openapi 3) document how users can manually remove the x-kubernetes-patch-strategy from the openapi schema if they want the best of both worlds (Kustomize Application SMP support without replace behavior)

alexmt commented 2 weeks ago

Thank you for catching it @crenshaw-dev . I agree the Kustomize use case is important and we should support it. I've realized that controller never modifies application spec. It only patches status. The error happens only when controller patches status.sync.comparedTo field that embeds source structure. I've updated PR to use replace strategy for status.sync.comparedTo. This solves the problem and does not introduce any functional changes.

crenshaw-dev commented 2 weeks ago

Oh nice that's genius-level stuff, thanks! Will review.

alexmt commented 2 weeks ago

@jessesuen, missed your comments, sorry. Adding tests from another PR - great suggestion.

I think the replace strategy still affects kustomize because it prevent uses from merging even maps. It will be impossible to introduce one key in one patch and another key in other patch.

ro0NL commented 2 weeks ago

that's afaik the main (only?) use case for valuesObject

for me it's avoiding yaml string in yaml

It shouldn't behave the same as the values string.

if they dont behave equivalent, it should be documented? https://argo-cd.readthedocs.io/en/stable/user-guide/helm/#values

crenshaw-dev commented 2 weeks ago

@ro0NL behavior with respect to manifest hydration is identical. Patching behavior is different, but I think the difference is intuitive due to the field types.

ro0NL commented 2 weeks ago

@crenshaw-dev excuse my ignorance. You mean ArgoCD uses a different update method depending on values or valuesObject?

crenshaw-dev commented 2 weeks ago

@ro0NL I just mean that it's possible for tools like Kustomize to apply json patches to the object field, but not the string field. From a user's perspective, everything else should behave the same.

alexmt commented 2 weeks ago

/cherry-pick release-2.11

alexmt commented 2 weeks ago

/cherry-pick release-2.10

gcp-cherry-pick-bot[bot] commented 2 weeks ago

Cherry-pick failed with Merge error ec09937fe047058ba53d19aafc9110933e589bf6 into temp-cherry-pick-ce7a6e-release-2.10

alexmt commented 2 weeks ago

/cherry-pick release-2.9

gcp-cherry-pick-bot[bot] commented 2 weeks ago

Cherry-pick failed with Merge error ec09937fe047058ba53d19aafc9110933e589bf6 into temp-cherry-pick-ce7a6e-release-2.9

morfien101 commented 1 week ago

Is it possible to know what release versions will contain this patch? I see the cherry picks failed for 2.9 and 2.10. Does that mean we need to wait until 2.11 comes out?

diranged commented 1 week ago

Is it possible that this patch has suddenly changed the behavior of the Application Controller so that it isn't comparing things the same way? We are seeing a massive (relatively speaking) jump in the PATCH requests/second that the ArgoCD App controller is making now across ~10 clusters that we did this upgrade on:

image

I suspect the issue has to do with how the code is comparing the valueFiles lists from before and after.. we had some valueFiles lists that had duplicate values.. and when we run kubectl get applications -w -o json | kubectl grep -w, we see diff's like these:

        revision: "794fcbae2671366663e107cd50079c2e96894591"
        source:
          helm:
            ignoreMissingValueFiles: true
            valueFiles:
              - "values.yaml"
              - "values.staging-xx.yaml"
              - "values.staging-xx.yaml"
              - "values/staging/values.yaml"
              - "values/staging/staging-xx/values.yaml"
              - "values/staging/staging-xx/types/values.otherfile.yaml"
              - "values/staging/staging-xx/values.staging-xx.yaml"
-             - 
+             - "values.staging-xx.yaml"
            values: "
cluster:
  accountId: ".."