Open nazarewk opened 2 years ago
Base: 55.38% // Head: 55.59% // Increases project coverage by +0.21%
:tada:
Coverage data is based on head (
e4d98aa
) compared to base (517c1ff
). Patch coverage: 100.00% of modified lines in pull request are covered.
:umbrella: View full report at Codecov.
:loudspeaker: Do you have feedback about the report comment? Let us know in this issue.
seems like syncContext.getOperationPhase()
is called only in a very specific case (not the one being tested). How should i proceed? Feels like I'm misunderstanding what happens step by step
I want to confirm following scenario:
Previous sync has finished succesfully
New (1st tick) Sync is started - Live hook was created before this
old hook got deleted and recreated - Live hook is still old
2nd tick of Sync gets stuck on PreSync because Live hook is still old
bump CreationTimestamp
on Live object
3rd tick of Sync is waiting for hook to finish
Kudos, SonarCloud Quality Gate passed!
0 Bugs
0 Vulnerabilities
0 Security Hotspots
1 Code Smell
No Coverage information
0.0% Duplication
managed to fully test expected flow, PR is ready for review
TLDR;
DELETE
event during handling ofBeforeHookCreation
deletion policy can be delayed whileetcd
is running garbage collection:gitops-engine
does not verify creation timestamp ofBeforeHookCreation
objects against start time of sync operation,gitops-engine
is unaware (old) object should be purged from cache and happily proceeds to next Sync WaveThe PR fixes this specific failure mode with minimal changes to the code.
The change is impossible to reproduce as
etcd
's GC usually does not take long enough to trigger the issue. At the same time it is very difficult to debug.the EKS cluster I observed issues at had tens of thousands objects with >1k objects churned every few minutes and still GC triggered the issue only few times a week
fixes https://github.com/argoproj/gitops-engine/issues/446 closes https://github.com/argoproj/argo-cd/pull/10579 original issue https://github.com/argoproj/argo-cd/issues/10077