fluxcd / flux2

Open and extensible continuous delivery solution for Kubernetes. Powered by GitOps Toolkit.
https://fluxcd.io
Apache License 2.0
6.13k stars 572 forks source link

Improvement: dependency requeue delay #4808

Open juliusmh opened 1 month ago

juliusmh commented 1 month ago

Describe the bug

It would be useful to specify requeue-dependency timeout on a per-kustomization basis, e.g. in the CRD. While 30s make sense for "heavy" KSs, a light apply e.g. of CRDs doesn't need to wait the full 30s before requeuing its dependants. The controller wide level could serve as a default.

Another problem with dependsOn is the general delay it adds, even if depended KSs are unchanged. Therefore, a possible improvement could be to entirely skip the delay if no changes were done to a dependency.

I'm not familiar with the codebase hence I can't comment on feasibility for those ideas. If this is not desired, or impractical, feel free to close. Otherwise, I'd be happy to discuss and get involved.

Related: https://github.com/fluxcd/flux2/issues/4739

Steps to reproduce

N/A

Expected behavior

I could foresee extending the kustomize.toolkit.fluxcd.io CRD to allow for a specific timeout and skipping the dependency checks if they are "unchanged":

apiVersion: kustomize.toolkit.fluxcd.io/v1
kind: Kustomization
spec:
  dependsOn:
    - name: crds
      requeueAfter: 2s
      skipUnchanged: true

Screenshots and recordings

No response

OS / Distro

N/A

Flux version

N/A

Flux check

N/A

Git provider

No response

Container Registry provider

No response

Additional context

No response

Code of Conduct

stefanprodan commented 1 month ago

The --requeue-dependency can be set to 5s on most cluster with no side effects. 4 years ago when we've set the 30s default we didn't had benchmarks in place, now I can say with confidence that the CPU load doesn't increase much even with 5000 Kustomizations.

Another problem with dependsOn is the general delay it adds, even if depended KSs are unchanged. Therefore, a possible improvement could be to entirely skip the delay if no changes were done to a dependency.

We can't determine that some objects are unchanged, unless we reconcile, so it's not possible to have this skip behaviour.

wvh commented 2 weeks ago

I understand the problem with skip behaviour. But would a at-least-once dependency make sense? An app depends on a database. The app might see daily updates, but the database might only be upgraded once a year. The dependency is here because obviously a database needs to be installed for the app to work, but once that dependency has been fulfilled at least once, the app could assume there to be a working database and not care about the state of the kustomization.

In concrete terms, the controller would know if a kustomization has ever been in the ready state, and if so, ignore the dependency or assume fulfilled. Maybe this is more of a requires than a dependsOn relationship.