argoproj / argo-cd

Declarative Continuous Deployment for Kubernetes
https://argo-cd.readthedocs.io
Apache License 2.0
17.49k stars 5.32k forks source link

`argocd app sync/diff --local` doesn't account for sidecar CMPs #8145

Open crenshaw-dev opened 2 years ago

crenshaw-dev commented 2 years ago

Describe the bug

argocd app diff <appname> --local . fails when <appname> is handled by a sidecar CMP. Specifically, it fails with this error:

FATA[0000] config management plugin with name '' is not supported 

argocd app sync <appname> --local . fails with the same error.

To Reproduce

  1. Install a sidecar CMP. The one from the docs works fine.
  2. Add an app which uses that CMP.
  3. Run argocd app diff <appname> --local . from the directory in the git clone where the app is defined.

Expected behavior

I would expect a diff to be generated. Since the example CMP always produces the same manifest, I'd expect the diff to be empty.

Version

argocd: v2.1.6+a346cf9.dirty
  BuildDate: 2021-11-01T02:05:06Z
  GitCommit: a346cf933e10d872eae26bff8e58c5e7ac40db25
  GitTreeState: dirty
  GoVersion: go1.17.2
  Compiler: gc
  Platform: darwin/amd64
argocd-server: v2.1.6+a346cf9.dirty
  BuildDate: 2021-11-01T02:05:06Z
  GitCommit: a346cf933e10d872eae26bff8e58c5e7ac40db25
  GitTreeState: dirty
  GoVersion: go1.17.2
  Compiler: gc
  Platform: darwin/amd64
  Ksonnet Version: v0.13.1
  Kustomize Version: v4.4.0 2021-09-27T16:13:36Z
  Helm Version: v3.7.1+g1d11fcb
  Kubectl Version: v0.21.0
  Jsonnet Version: v0.17.0
crenshaw-dev commented 2 years ago

Proposal: server-side diffing for local code

Local diff/syncing is currently implemented by running repo-server code on the CLI's host machine. It's the user's responsibility to install the necessary CLIs (like Helm or Kustomize).

We shouldn't expect users to configure their local machine to run the same CMPs as those installed on the repo-server. Instead, we should have the CLI send the user's local files to the repo-server (via the Argo CD API), and have the repo-server send back the generated diff.

Steps

  1. Implement the necessary CLI/API/repo-server components to perform server-side diffing.

    1. CLI: Add a --server-side-generate flag (defaulted to false).
    2. CLI: Print a warning whenever --local is used without --server-side-generate.

      WARNING: Client-side local diffing/syncing does not support sidecar CMPs and is DEPRECATED in vX.Y.Z. Switch to using `--server-side-generate`. Server-side local diffing/syncing will be default behavior in vX.Y+1.Z.
      
    3. CLI: Add a --local-include argument accepting a glob pattern (or maybe regex?) to determine which files are sent to the repo-server. If using globs, accept multiple (and union whatever they produce). Use a reasonable default like [.yaml, .json, etc.] that will exclude likely-unnecessary files (like .go, .java, etc.).
    4. CLI: Add a warning before sending files for diffing.

      WARNING: The following files will be sent to the Argo CD server for manifest generation.
      <files>
      Continue (use --skip-files-warning to disable this check)? Y/n
    5. CLI: Add a --skip-files-warning flag in case the user needs a non-interactive diff/sync.
    6. CLI: Add the code to actually send the files. This will probably involve selecting some archive format and a golang library to perform the archiving.
    7. API: Add/update a diff/sync endpoint to accept user files (as an archive) and send that to repo-server for manifest generation.

      Be careful about what you log. Don't want to inadvertently log a secret that the user accidentally sent with their source files.

    8. API: Add a configurable max size limit for the user's request (if that feature isn't already present). Reject requests that are too large.
    9. repo-server: Add/update a manifest generation endpoint to accept the user's files as an archive. Implement logic to 1) copy the app's current source, 2) unpack the user-provided archive over the source, 3) generate the manifests, and 4) delete the source copy.

      Read the archive library docs carefully the ensure we're not introducing security vulnerabilities (like directory traversal) to the repo-server.

      Be careful about what you log. Don't want to inadvertently log a secret that the user accidentally sent with their source files.

  2. In version Y+1, delete the client-side local diff/sync code and default to server-side.

Security considerations

Noted above, but repeated here:

  1. Don't accept oversize requests.
  2. Empower the user to limit what files they send and inspect the list before actually sending them.
  3. As an extra precaution, be careful not to log the contents of the user's files, in case they accidentally sent something sensitive.
  4. Use the archive library carefully to avoid things like directory traversal.
crenshaw-dev commented 1 year ago

Not totally fixed. We should do this: https://github.com/argoproj/argo-cd/issues/10936

joshuasimon-taulia commented 1 year ago

i'm seeing this error

FATA[0002] rpc error: code = Unknown desc = error receiving manifest file stream: error receiving tgz file: file checksum validation error: calc e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855 sent f1b4f206df2ea9a29fa819c5a2c6bdc883cd79433bbf9ad36dd11e600306d4e7

when running argocd app diff jx-preprod --server-side-generate --local "$PWD" from the directory that contains my helmfile.yaml. my app uses this helmfile CMP

this is the argocd application object

apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  creationTimestamp: "2022-10-29T22:04:37Z"
  generation: 2847
  name: jx-preprod
  namespace: argocd
  ownerReferences:
  - apiVersion: argoproj.io/v1alpha1
    blockOwnerDeletion: true
    controller: true
    kind: ApplicationSet
    name: dev
    uid: 8f4d596e-8ffd-452c-b2d0-f4a5dedbf2e1
  resourceVersion: "724952059"
  uid: 3ee646fa-67f2-4787-ab73-6e34bdbb1dbc
spec:
  destination:
    namespace: jx
    server: https://kubernetes.default.svc
  ignoreDifferences:
  - group: admissionregistration.k8s.io
    jqPathExpressions:
    - .webhooks[].namespaceSelector.matchExpressions[] | select(.key == "control-plane")
    kind: MutatingWebhookConfiguration
    name: istio-sidecar-injector
  project: dev
  source:
    path: helmfiles/jx
    plugin:
      env:
      - name: HELMFILE_USE_CONTEXT_NAMESPACE
        value: "true"
      - name: HELM_TEMPLATE_OPTIONS
        value: --skip-tests
      name: helmfile
    repoURL: https://github.com/my-org/jenkins-x-dev
    targetRevision: HEAD
  syncPolicy:
    syncOptions:
    - CreateNamespace=true
  sourceType: Plugin
  summary:
    externalURLs:
    - https://bucketrepo-jx.dev.my-org.com
    - https://dashboard-jx.dev.my-org.com
    - https://hook-jx.dev.my-org.com/zzhookzz
    - https://lighthouse-jx.dev.my-org.com
    images:
    - chartmuseum/chartmuseum:v0.12.0
    - docker.io/bitnami/external-dns:0.11.1-debian-10-r23
    - ghcr.io/jenkins-x-plugins/jx-preview:0.0.225
    - ghcr.io/jenkins-x/bucketrepo:0.1.67
    - ghcr.io/jenkins-x/jx-boot:3.7.8
    - ghcr.io/jenkins-x/jx-build-controller:0.4.7
    - ghcr.io/jenkins-x/jx-pipelines-visualizer:1.8.2
    - ghcr.io/jenkins-x/jx-slack:0.2.1
    - ghcr.io/jenkins-x/lighthouse-foghorn:1.10.6
    - ghcr.io/jenkins-x/lighthouse-gc-jobs:1.10.6
    - ghcr.io/jenkins-x/lighthouse-keeper:1.10.6
    - ghcr.io/jenkins-x/lighthouse-tekton-controller:1.10.6
    - ghcr.io/jenkins-x/lighthouse-webhooks:1.10.6
    - ghcr.io/jenkins-x/lighthouse-webui-plugin:0.1.7
  sync:
    comparedTo:
      destination:
        namespace: jx
        server: https://kubernetes.default.svc
      source:
        path: helmfiles/jx
        plugin:
          env:
          - name: HELMFILE_USE_CONTEXT_NAMESPACE
            value: "true"
          - name: HELM_TEMPLATE_OPTIONS
            value: --skip-tests
          name: helmfile
        repoURL: https://github.com/my-org/jenkins-x-dev
        targetRevision: HEAD
    revision: a8f2bfc348431499db186a2f1a3ba95d58926fce
    status: OutOfSync
 ✗ argocd version
argocd: v2.5.1+504da42
  BuildDate: 2022-11-01T21:30:52Z
  GitCommit: 504da424c2c9bb91d7fb2ebf3ae72162e7a5a5be
  GitTreeState: clean
  GoVersion: go1.18.7
  Compiler: gc
  Platform: darwin/amd64
argocd-server: v2.5.0+b895da4
  BuildDate: 2022-10-25T14:40:01Z
  GitCommit: b895da457791d56f01522796a8c3cd0f583d5d91
  GitTreeState: clean
  GoVersion: go1.18.7
  Compiler: gc
  Platform: linux/amd64
  Kustomize Version: v4.5.7 2022-08-02T16:35:54Z
  Helm Version: v3.10.1+g9f88ccb
  Kubectl Version: v0.24.2
  Jsonnet Version: v0.18.0
crenshaw-dev commented 1 year ago

You get that consistently? The checksum check is meant to prevent tampering or data corruption. I'd be really surprised to see it consistently.

Can you run a hard refresh to make sure you don't keep getting a cached error?

joshuasimon-taulia commented 1 year ago

i tried

is there an example of the directory context and files to include when using --server-side-generate?

arturhoo commented 1 year ago

Following the original motivation, what do you think of app manifests also accepting --server-side-generate?

cilindrox commented 1 year ago

Also seeing the "file checksum validation error" consistently when running app diff --local apps/foo --server-side-generate both locally and on the CI server (ephemeral containers, git clone -> helm dep up -> argocd app diff)

FATA[0004] rpc error: code = Unknown desc = error receiving manifest file stream: error receiving tgz file: file checksum validation error

Seems the crc is not being properly computed or there's a difference in the way it's being generated.

crenshaw-dev commented 1 year ago

@cilindrox can you check the server-side log and post the checksum values?

I did a deep dive on this error with @joshuasimon-taulia and found that when the API server receives the streamed file it receives metadata but no actual file contents. So its checksum is always the checksum of an empty array of bytes.

I still have no idea why that is happening.

cilindrox commented 1 year ago

Thanks for the quick reply @crenshaw-dev - I'm not seeing any relevant logs on app.kubernetes.io/name=argocd-server - (checked every other instance, just in case and still no dice).

Here's the crc client-side:

calc e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855 sent cc547b59cb862b3f3b389d4cbbed234d27f0282e3de4eab31833fd2bdb7e1ce8

Should I be running this with a debug log level or similar?

Edit: lack of logs got me thinking - could this be getting filtered at the ingress level? ie: the payload's not reaching the server?

crenshaw-dev commented 1 year ago

Yep, e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855 is the same "calculated" checksum as what @joshuasimon-taulia was getting - zero bytes.

Should I be running this with a debug log level or similar?

Wouldn't hurt, but I don't think there will be much more info. I think the issue is pretty low-level.

could this be getting filtered at the ingress level? ie: the payload's not reaching the server?

Quite possibly. I doubt that the server is receiving the bytes and then just ignoring them. I bet they're not making it over the network.

joshuasimon-taulia commented 1 year ago

could this be getting filtered at the ingress level? ie: the payload's not reaching the server?

we tried the same exercise using kubectl port-forward to bypass nginx ingress, and the results were still the same

james-trousdale-lyb commented 1 year ago

I hit the same issue in a scenario where we are using --port-forward and --plaintext on the argocd CLI - with the same exact checksum for the empty array of bytes.

For now, we'll skirt around this issue in our use case (which is to show the diffs on pull requests pre-merge) by using the --revision flag, but it'd be nice to be able to run it locally, obviously.

lblazewski commented 1 year ago

Hi!

I also got into the error of checksums not matching today when trying to figure out the diff from local path. I see that the issue referenced in this PR was removed from the v2.6 milestone but the warning about not using --server-side-generate is still present and it clearly says that it will become the default in 2.6.

Does it mean that this issue will be addressed before v2.6 is released? Since otherwise once v2.6 would be released argocd app diff --local would be broken.

rassie commented 1 year ago

Having encountered this checksum problem myself, I can contribute to the confusion a bit: in my environment I could narrow this particular problem down to incorrectly setup non-web gRPC ingress. After fixing my setup, I could verify that app diff --local . works (for various stages of "works", I'm having a different problem after that step) without --grpc-web and doesn't (EOF) with --grpc-web.

cha7ri commented 1 year ago

For my case it works if if I use --port-forward and skip the ingress. @rassie how did you fix your ingress? Are you using nginx ingress controller?

rassie commented 1 year ago

@cha7ri I'm on k3d/k3s, so it's a Traefik-based ingress controller. It seem that the Ingresses provided by the Helm chart have been incomplete, so I've bit the bullet and implemented Traefik Ingress as described in the docs.

thomassandslyst commented 1 year ago

I'm also getting this issue where the file streaming part of diff --server-side-generate through the grpc-web just doesn't work and the end result is a different checksum.

We cannot use pure grpc as our ArgoCD is behind Cloudflare Access which doesn't support any grpc.

crenshaw-dev commented 1 year ago

Spreading the knowledge: grpc-web doesn't support client streaming, except by websocket: https://github.com/argoproj/argo-cd/issues/12032#issuecomment-1608190509

jeremych1000 commented 2 months ago

Can I find out what the latest is on this?

From what I'm gathering (I'm using traefik as ingress):

I'm still getting both these errors and haven't been able to figure out a combination that works for server side diffs that includes Kubernetes schema validation!

FATA[0008] failed to send manifest stream file: error sending stream: EOF

or

FATA[0004] rpc error: code = Unknown desc = error receiving manifest file stream: error receiving tgz file: file checksum validation error