fluxcd / notification-controller

The GitOps Toolkit event forwarder and notification dispatcher
https://fluxcd.io
Apache License 2.0
153 stars 132 forks source link

The git commit status is always marked as success before it is marked as failed #327

Open antoineozenne opened 2 years ago

antoineozenne commented 2 years ago

When committing bad values to force the git commit status as failed, the commit is first marked as success (before all resources are reconciled), then after the kustomize-controller retries, marked as failed. The commit should not have a status before all resources are reconciled. Here is my configuration for this use case.

My repo tree:

# tree
.
├── applications
│   ├── base
│   └── dev-cluster
├── clusters
│   └── dev-cluster
│       ├── flux-system
│       │   ├── gotk-components.yaml
│       │   ├── gotk-sync.yaml
│       │   └── kustomization.yaml
│       └── infrastructure-kustomization.yaml # My kustomization to reconcile /infrastructure/dev-cluster
├── infrastructure
│   ├── base
│   │   └── kyverno
│   │       ├── kustomization.yaml
│   │       ├── kyverno-alert.yaml
│   │       ├── kyverno-helmrelease.yaml # My test release with a failed value
│   │       ├── kyverno-helmrepository.yaml
│   │       └── kyverno-namespace.yaml
│   └── dev-cluster
│       ├── flux-notifications
│       │   ├── alerts
│       │   │   ├── infrastructure-alert.yaml # My GitLab alert
│       │   │   └── kustomization.yaml
│       │   └── providers
│       │       ├── alertmanager-provider.yaml
│       │       ├── gitlab-infrastructure-provider.yaml # My GitLab provider
│       │       └── kustomization.yaml
│       └── kyverno
│           ├── kustomization.yaml
│           └── values.yaml
└── README.md

My manifests:

# infrastructure-kustomization.yaml
apiVersion: kustomize.toolkit.fluxcd.io/v1beta2
kind: Kustomization
metadata:
  name: infrastructure
  namespace: flux-system
spec:
  dependsOn:
    - name: flux-system
  interval: 5m
  sourceRef:
    kind: GitRepository
    name: flux-system
  path: ./infrastructure/dev-cluster
  prune: true
  wait: true
  timeout: 2m

According to the documentation (https://fluxcd.io/docs/components/kustomize/kustomization/#health-assessment), with wait and timeout the kustomize-controller will wait all resources are reconciled to mark the Kustomization as ready.

# gitlab-infrastructure-provider.yaml
apiVersion: notification.toolkit.fluxcd.io/v1beta1
kind: Provider
metadata:
  name: gitlab-infrastructure
  namespace: flux-system
spec:
  type: gitlab
  address: https://myrepo
  secretRef:
    name: provider-gitlab-infrastructure-token
# infrastructure-alert.yaml
apiVersion: notification.toolkit.fluxcd.io/v1beta1
kind: Alert
metadata:
  name: infrastructure
  namespace: flux-system
spec:
  providerRef:
    name: gitlab-infrastructure
  eventSeverity: info
  eventSources:
    - kind: Kustomization
      name: infrastructure
      namespace: flux-system

And the GitLab jobs:

image

somtochiama commented 2 years ago

I am unable to replicate this on my end. I committed invalid yaml but got only one failed job. What kind of bad values are you committing to git?

Screenshot 2022-02-14 at 11 01 34 PM
antoineozenne commented 2 years ago

These values (with spec.values.resources.requests.cpu: 10mm):

apiVersion: helm.toolkit.fluxcd.io/v2beta1
kind: HelmRelease
metadata:
  name: kyverno
  namespace: kyverno
spec:
  releaseName: kyverno
  chart:
    spec:
      chart: kyverno
      version: v2.1.10
      sourceRef:
        kind: HelmRepository
        name: kyverno
        namespace: flux-system
  interval: 5m
  install:
    remediation:
      retries: 3
  values:
    resources:
      requests:
        cpu: 10mm
        memory: 128Mi
      limits:
        cpu: 500m
        memory: 1Gi
    initResources:
      requests:
        cpu: 10m
        memory: 64Mi
      limits:
        cpu: 100m
        memory: 256Mi