A race condition exists in the handling of webhooks together with kustomize builds. When a git webhook comes in a kustomize build get triggered. If a seconds git webhook arrives within the time it takes the repository controller to build kustomize, the second git webhook gets dropped / skipped.
Webhook for commit 1 comes in -> start building kustomize
Webhook for commit 2 comes in -> skipped as there is already a build in progress
Build for webhook 1 completes -> changes are applied
ArgoCD markes the environment as "in-sync" while it's on commit 1 and not on commit 2
This causes changes to be executed once the timeout exceeds. Unfortunately, due to the long kustomize builds, users are inclined to extend that timeout to prevent the repository server from constantly being heavily loaded.
Proposal
Multiple scenarios are possible:
Allow users to opt-in into cancelling the builds for commit 1 when commit 2 comes in ( downside, if commits are more frequent then builds complete no changes ever end up on environment
Allow users to opt-in into "last pending build". This way active builds aren't cancelled (so changes are executed) but the next pending build is registered. In this scenario a build for commit 2 would trigger after the build for commit 1 has completed. If a 3rd commit comes in while build for commit 1 is still running, commit 2 is skipped and commit 3 is put as "pending" for the environment. This would prevent the downside of the first option while still making sure that commits aren't skipped
Summary
A race condition exists in the handling of webhooks together with kustomize builds. When a git webhook comes in a kustomize build get triggered. If a seconds git webhook arrives within the time it takes the repository controller to build kustomize, the second git webhook gets dropped / skipped.
Discussed in Slack
Motivation
Scenario:
This causes changes to be executed once the timeout exceeds. Unfortunately, due to the long kustomize builds, users are inclined to extend that timeout to prevent the repository server from constantly being heavily loaded.
Proposal
Multiple scenarios are possible: