fluxcd / flux2

Open and extensible continuous delivery solution for Kubernetes. Powered by GitOps Toolkit.
https://fluxcd.io
Apache License 2.0
6.13k stars 572 forks source link

flux diff fails if the namespace does not exist #3270

Open ilijamt opened 1 year ago

ilijamt commented 1 year ago

Describe the bug

If a namespace does not exist the diff fails with the resources complaining that the namespace does not exist.

I've tried this on k8s 1.22, and 1.24 with same result.

Steps to reproduce

Create the following files

efs-csi-driver
├── helmrelease-efs-csi-driver.yaml
├── helmrepository-efs-csi-driver.yaml
├── kustomization.yaml
└── namespace.yaml
# helmrelease-efs-csi-driver.yaml
---
apiVersion: helm.toolkit.fluxcd.io/v2beta1
kind: HelmRelease
metadata:
  name: aws-efs-csi-driver
  namespace: efs-csi-driver
spec:
  releaseName: aws-efs-csi-driver
  interval: 1m
  chart:
    spec:
      chart: aws-efs-csi-driver
      version: 2.2.9
      sourceRef:
        kind: HelmRepository
        name: aws-efs-csi-driver
# helmrepository-efs-csi-driver.yaml
---
apiVersion: source.toolkit.fluxcd.io/v1beta1
kind: HelmRepository
metadata:
  name: aws-efs-csi-driver
  namespace: efs-csi-driver
spec:
  interval: 30m
  url: https://kubernetes-sigs.github.io/aws-efs-csi-driver/
# kustomization.yaml
---
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
resources:
  - namespace.yaml
  - helmrepository-efs-csi-driver.yaml
  - helmrelease-efs-csi-driver.yaml
# namespace.yaml
---
apiVersion: v1
kind: Namespace
metadata:
  name: efs-csi-driver

Then running

$ flux diff kustomization flux-system --path efs-csi-driver
✓  Kustomization diffing...
► Namespace/efs-csi-driver created
✗ 2 errors occurred:
    * HelmRepository/efs-csi-driver/aws-efs-csi-driver namespace not specified, error: namespaces "efs-csi-driver" not found
    * HelmRelease/efs-csi-driver/aws-efs-csi-driver namespace not specified, error: namespaces "efs-csi-driver" not found

Expected behavior

I expected to show the resources that would be created

$ flux diff kustomization flux-system --path efs-csi-driver
✓  Kustomization diffing...
► Namespace/efs-csi-driver created
► HelmRepository/efs-csi-driver/aws-efs-csi-driver created
► HelmRelease/efs-csi-driver/aws-efs-csi-driver created

Screenshots and recordings

No response

OS / Distro

Linux x1c 5.19.0-2-amd64 #1 SMP PREEMPT_DYNAMIC Debian 5.19.11-1 (2022-09-24) x86_64 GNU/Linux

Flux version

0.35.0 and 0.36.0

Flux check

► checking prerequisites ✔ Kubernetes 1.24.2 >=1.20.6-0 ► checking controllers ✔ helm-controller: deployment ready ► ghcr.io/fluxcd/helm-controller:v0.26.0 ✔ kustomize-controller: deployment ready ► ghcr.io/fluxcd/kustomize-controller:v0.30.0 ✔ notification-controller: deployment ready ► ghcr.io/fluxcd/notification-controller:v0.28.0 ✔ source-controller: deployment ready ► ghcr.io/fluxcd/source-controller:v0.31.0 ► checking crds ✔ alerts.notification.toolkit.fluxcd.io/v1beta1 ✔ buckets.source.toolkit.fluxcd.io/v1beta2 ✔ gitrepositories.source.toolkit.fluxcd.io/v1beta2 ✔ helmcharts.source.toolkit.fluxcd.io/v1beta2 ✔ helmreleases.helm.toolkit.fluxcd.io/v2beta1 ✔ helmrepositories.source.toolkit.fluxcd.io/v1beta2 ✔ kustomizations.kustomize.toolkit.fluxcd.io/v1beta2 ✔ ocirepositories.source.toolkit.fluxcd.io/v1beta2 ✔ providers.notification.toolkit.fluxcd.io/v1beta1 ✔ receivers.notification.toolkit.fluxcd.io/v1beta1 ✔ all checks passed

Git provider

No response

Container Registry provider

No response

Additional context

No response

Code of Conduct

souleb commented 1 year ago

flux diff kustomization does only a dry-run it does not apply anything. Your namespace here is not actually applied, that's why you have the error.

The kustomize-controller does the apply in 3 steps:

  1. apply crds and namespaces
  2. wait for applied crds and namespaces to be ready
  3. apply the remaining resources
stefanprodan commented 1 year ago

Flux behaves like kubectl diff, I'm not sure if we should handle this differently.

GfxKai commented 1 year ago

+1 on this. Given that Flux (and GitOps in general) champions a fully declarative model (vs the imperative nature of kubectl), I'd expect flux diff to behave differently—this feels like an important piece of the GitOps deployment workflow.

We're running flux diff in CI to enable checks / visibility of what infrastructure changes will be made during the PR approval process. I'd expect the result of flux diff to be as close to a real deployment as possible: flux diff should only fail if the resultant deployment would be expected to fail.

Currently, our CI pipeline fails (with a false negative) and refuses to show us a useful diff unless we explicitly create the required namespace(s) first via kubectl, which feels at odds with the whole point of adopting a GitOps approach in the first place 🥲

As a side note, more granular detail in the diff would be amazing if possible (especially being able to drill down into changes resulting from updating HelmRelease spec values). It'd open up useful possibilities like a flux bot to display infrastructure diffs as comments in PRs 🚀

souleb commented 1 year ago

Applying with the cli could create other issues, like conflicts with other applying controllers. Actually deploying and cleanup, also means providing the cli with extended permissions. I don't think we want to do that.

Wouldn't adding a dry-run capability on the kustomize-controller be more effective? This has been proposed last year by a user. We could have a list of objects to only dry-run, and we wouldn't have any RBAC issue.

GfxKai commented 1 year ago

Something like

flux reconcile kustomization [name] --with-source --dry-run --export path/to/ci-artifact.yml

would be nice to allow downstream CI workflows to display/inspect infra changes 🔬

koba1t commented 1 year ago

I think kubectl diff have the same problem. https://github.com/kubernetes/kubectl/issues/1331

matheuscscp commented 1 year ago

My goal with flux diff is to build a CI job for validating proposed changes in git, like a Terraform Plan workflow. Right now I'm handling this corner case with a long explanation like this:

The plan could not execute because this change is creating resources in a namespace that
doesn't exist yet, probably because this change is also creating this namespace. The job
will not fail in order not to prevent these two resources from being created in the same
change, but if there's any invalid configuration the next plan will show it and fail. If
you want to see the plan properly then split this change in two, the first one creating
the namespace, and the second one creating everything else.

I agree that behaving differently from kubectl diff sounds awkward and in this particular case it might also be a tough challenge to execute the right way, but in practice I don't see a reason for failing the entire plan just because the namespace doesn't exist, so I would also love to see a proper solution for this problem. I think that being able to provision an entire stack in a single Git change with a proper CI plan should ideally be possible, it's a matter of speed on infrastructure reproducibility.