Open VibhuSrivastava opened 1 month ago
In the end we ended up modifying our kubernetes apply script so that it does an ordered apply (analysis templates first, everything else in the second go) Looks like that has solved the problem, sharing here incase someone else runs into the same issue.
Checklist:
Describe the bug
We have observed that there is a race condition when you make changes to argo rollouts analysis templates, sometimes the rollout starts before the analysis template is updated, so an analysis run can begin already before the analysis template is updated.
To Reproduce Since this is a race condition, it’s hard to predict when this will happen
Previous analysis template:
Analysis template to be applied:
We noticed that the rollout was aborted because of the old analysis template failing (avg-success-rage-for-http-requests)
Expected behavior
When analysis templates are updated then that any changes to analysis templates should always happen first before an analysis run starts. In the new rollout, the analysis template being run was expected to be total-success-rate-for-http-requests when it actually was avg-success-rage-for-http-requests.
The same change was getting rolled out to multiple environments, and the created analysis template was with total as expected, except for 1 case when it was with avg, so the behaviour is unpredictable.
Screenshots
Version
1.7.2
Logs
Message from the maintainers:
Impacted by this bug? Give it a 👍. We prioritize the issues with the most 👍.