We have a Rollout that is configured to stand up both a cold stable/control pod along with a cold canary pod as an Experiment that runs for a 90 second duration. During the experiment, we run an analysis template that gives the stable and canary pods some time to send metrics off to DataDog (initialDelay of 50 seconds), and then checks the difference of the errors between the two experiment pods on a 10 second interval. This AnalysisTemplate tolerates 2 failures of our query.
During a Rollout, we observed Argo-Rollouts correctly:
Created the experiment pods.
Ran the AnalysisRun and resulted in 2 successes and 3 failures. With a failureLimit of 2, the AnalysisRun was correctly labeled as "Failed".
However, the overall status of the Experiment was incorrectly labeled as Successful:
Based on the logs below, it looks like the Experiment was marked as Successful just before the third AnalysisRun failure occurred, marking the overall AnalysisRun as a failure and resulting in the contradiction above. I would expect the Experiment to allow an in-progress AnalysisRun to complete so the Experiment takes this into account.
Diagnostics
What version of Argo Rollouts are you running? v0.10.2
Logs
# Logs for the entire controller:
time="2021-01-12T21:05:43Z" level=info msg="Started syncing Analysis at (2021-01-12 21:05:43.000220253 +0000 UTC m=+1513.594407927)" analysisrun=my-app-7f7c78bd49-6-0-error-rate-analysis namespace=my-namespace
time="2021-01-12T21:05:43Z" level=info msg="running overdue measurement" analysisrun=my-app-7f7c78bd49-6-0-error-rate-analysis metric=error-rate namespace=my-namespace
time="2021-01-12T21:05:43Z" level=info msg="taking 1 measurements" analysisrun=my-app-7f7c78bd49-6-0-error-rate-analysis namespace=my-namespace
time="2021-01-12T21:05:43Z" level=info msg="measurement completed Failed" analysisrun=my-app-7f7c78bd49-6-0-error-rate-analysis metric=error-rate namespace=my-namespace
time="2021-01-12T21:05:43Z" level=info msg="enqueueing analysis after 9.999898914s" analysisrun=my-app-7f7c78bd49-6-0-error-rate-analysis namespace=my-namespace
time="2021-01-12T21:05:43Z" level=info msg="Patch status successfully" analysisrun=my-app-7f7c78bd49-6-0-error-rate-analysis namespace=my-namespace
time="2021-01-12T21:05:43Z" level=info msg="Reconciliation completed" analysisrun=my-app-7f7c78bd49-6-0-error-rate-analysis namespace=my-namespace time_ms=173.358142
time="2021-01-12T21:05:43Z" level=info msg="Started syncing Analysis at (2021-01-12 21:05:43.17360435 +0000 UTC m=+1513.767792029)" analysisrun=my-app-7f7c78bd49-6-0-error-rate-analysis namespace=my-namespace
time="2021-01-12T21:05:43Z" level=info msg="taking 0 measurements" analysisrun=my-app-7f7c78bd49-6-0-error-rate-analysis namespace=my-namespace
time="2021-01-12T21:05:43Z" level=info msg="enqueueing analysis after 9.826259202s" analysisrun=my-app-7f7c78bd49-6-0-error-rate-analysis namespace=my-namespace
time="2021-01-12T21:05:43Z" level=info msg="No status changes. Skipping patch" analysisrun=my-app-7f7c78bd49-6-0-error-rate-analysis namespace=my-namespace
time="2021-01-12T21:05:43Z" level=info msg="Reconciliation completed" analysisrun=my-app-7f7c78bd49-6-0-error-rate-analysis namespace=my-namespace time_ms=0.276776
time="2021-01-12T21:05:53Z" level=info msg="Started syncing Experiment at (2021-01-12 21:05:53.000199831 +0000 UTC m=+1523.594387627)" experiment=my-app-7f7c78bd49-6-0 namespace=my-namespace
time="2021-01-12T21:05:53Z" level=info msg="Claimed ReplicaSet 'my-app-7f7c78bd49-6-0-stable' for template 'stable'" experiment=my-app-7f7c78bd49-6-0 namespace=my-namespace
time="2021-01-12T21:05:53Z" level=info msg="Claimed ReplicaSet 'my-app-7f7c78bd49-6-0-canary' for template 'canary'" experiment=my-app-7f7c78bd49-6-0 namespace=my-namespace
time="2021-01-12T21:05:53Z" level=info msg="Reconciling template" experiment=my-app-7f7c78bd49-6-0 namespace=my-namespace template=stable
time="2021-01-12T21:05:53Z" level=info msg="Template 'stable' transitioned from Running -> Successful" experiment=my-app-7f7c78bd49-6-0 namespace=my-namespace
time="2021-01-12T21:05:53Z" level=info msg="Reconciling template" experiment=my-app-7f7c78bd49-6-0 namespace=my-namespace template=canary
time="2021-01-12T21:05:53Z" level=info msg="Template 'canary' transitioned from Running -> Successful" experiment=my-app-7f7c78bd49-6-0 namespace=my-namespace
time="2021-01-12T21:05:53Z" level=info msg="Reconciling analysis" analysis=error-rate-analysis experiment=my-app-7f7c78bd49-6-0 namespace=my-namespace
time="2021-01-12T21:05:53Z" level=info msg="Experiment transitioned from Running -> Successful" experiment=my-app-7f7c78bd49-6-0 namespace=my-namespace
time="2021-01-12T21:05:53Z" level=info msg="Started syncing Analysis at (2021-01-12 21:05:53.001600105 +0000 UTC m=+1523.595787871)" analysisrun=my-app-7f7c78bd49-6-0-error-rate-analysis namespace=my-namespace
time="2021-01-12T21:05:53Z" level=info msg="running overdue measurement" analysisrun=my-app-7f7c78bd49-6-0-error-rate-analysis metric=error-rate namespace=my-namespace
time="2021-01-12T21:05:53Z" level=info msg="taking 1 measurements" analysisrun=my-app-7f7c78bd49-6-0-error-rate-analysis namespace=my-namespace
time="2021-01-12T21:05:53Z" level=info msg="Event(v1.ObjectReference{Kind:\"Experiment\", Namespace:\"my-namespace\", Name:\"my-app-7f7c78bd49-6-0\", UID:\"4f8e47fa-0942-44e6-a869-df721d6455dd\", APIVersion:\"argoproj.io/v1alpha1\", ResourceVersion:\"441228358\", FieldPath:\"\"}): type: 'Normal' reason: 'Successful' Template 'stable' transitioned from Running -> Successful"
time="2021-01-12T21:05:53Z" level=info msg="Event(v1.ObjectReference{Kind:\"Experiment\", Namespace:\"my-namespace\", Name:\"my-app-7f7c78bd49-6-0\", UID:\"4f8e47fa-0942-44e6-a869-df721d6455dd\", APIVersion:\"argoproj.io/v1alpha1\", ResourceVersion:\"441228358\", FieldPath:\"\"}): type: 'Normal' reason: 'Successful' Template 'canary' transitioned from Running -> Successful"
time="2021-01-12T21:05:53Z" level=info msg="Event(v1.ObjectReference{Kind:\"Experiment\", Namespace:\"my-namespace\", Name:\"my-app-7f7c78bd49-6-0\", UID:\"4f8e47fa-0942-44e6-a869-df721d6455dd\", APIVersion:\"argoproj.io/v1alpha1\", ResourceVersion:\"441228358\", FieldPath:\"\"}): type: 'Normal' reason: 'Successful' Experiment transitioned from Running -> Successful"
time="2021-01-12T21:05:53Z" level=info msg="Patch status successfully" experiment=my-app-7f7c78bd49-6-0 namespace=my-namespace
time="2021-01-12T21:05:53Z" level=info msg="Reconciliation completed" experiment=my-app-7f7c78bd49-6-0 namespace=my-namespace time_ms=30.785740999999998
time="2021-01-12T21:05:53Z" level=info msg="Enqueueing parent of my-namespace/my-app-7f7c78bd49-6-0: Rollout my-namespace/my-app"
time="2021-01-12T21:05:53Z" level=info msg="Started syncing rollout at (2021-01-12 21:05:53.032834185 +0000 UTC m=+1523.627021859)" namespace=my-namespace rollout=my-app
time="2021-01-12T21:05:53Z" level=info msg="Reconciling experiment step (stepIndex: 0)" namespace=my-namespace rollout=my-app
time="2021-01-12T21:05:53Z" level=info msg="Reconciling StableRS" namespace=my-namespace rollout=my-app
time="2021-01-12T21:05:53Z" level=info msg="Reconciling new ReplicaSet 'my-app-7f7c78bd49'" namespace=my-namespace rollout=my-app
time="2021-01-12T21:05:53Z" level=info msg="Reconciling old replica sets" namespace=my-namespace rollout=my-app
time="2021-01-12T21:05:53Z" level=info msg="Incrementing the Current Step Index to 1" namespace=my-namespace rollout=my-app
time="2021-01-12T21:05:53Z" level=info msg="Started syncing Experiment at (2021-01-12 21:05:53.039557395 +0000 UTC m=+1523.633745246)" experiment=my-app-7f7c78bd49-6-0 namespace=my-namespace
time="2021-01-12T21:05:53Z" level=info msg="Claimed ReplicaSet 'my-app-7f7c78bd49-6-0-stable' for template 'stable'" experiment=my-app-7f7c78bd49-6-0 namespace=my-namespace
time="2021-01-12T21:05:53Z" level=info msg="Claimed ReplicaSet 'my-app-7f7c78bd49-6-0-canary' for template 'canary'" experiment=my-app-7f7c78bd49-6-0 namespace=my-namespace
time="2021-01-12T21:05:53Z" level=info msg="Reconciling template" experiment=my-app-7f7c78bd49-6-0 namespace=my-namespace template=stable
time="2021-01-12T21:05:53Z" level=info msg="Event(v1.ObjectReference{Kind:\"Rollout\", Namespace:\"my-namespace\", Name:\"my-app\", UID:\"a98d9975-d8f3-4cb4-996e-3635c3946c46\", APIVersion:\"argoproj.io/v1alpha1\", ResourceVersion:\"441228021\", FieldPath:\"\"}): type: 'Normal' reason: 'SettingStableRS' Completed all steps"
time="2021-01-12T21:05:53Z" level=info msg="Event(v1.ObjectReference{Kind:\"Rollout\", Namespace:\"my-namespace\", Name:\"my-app\", UID:\"a98d9975-d8f3-4cb4-996e-3635c3946c46\", APIVersion:\"argoproj.io/v1alpha1\", ResourceVersion:\"441228021\", FieldPath:\"\"}): type: 'Normal' reason: 'SetStepIndex' Set Step Index to 1"
time="2021-01-12T21:05:53Z" level=info msg="Scaled down ReplicaSet my-app-7f7c78bd49-6-0-stable from 1 to 0" experiment=my-app-7f7c78bd49-6-0 namespace=my-namespace
time="2021-01-12T21:05:53Z" level=info msg="Reconciling template" experiment=my-app-7f7c78bd49-6-0 namespace=my-namespace template=canary
time="2021-01-12T21:05:53Z" level=info msg="Event(v1.ObjectReference{Kind:\"Experiment\", Namespace:\"my-namespace\", Name:\"my-app-7f7c78bd49-6-0\", UID:\"4f8e47fa-0942-44e6-a869-df721d6455dd\", APIVersion:\"argoproj.io/v1alpha1\", ResourceVersion:\"441229066\", FieldPath:\"\"}): type: 'Normal' reason: 'ScalingReplicaSet' Scaled down replica set my-app-7f7c78bd49-6-0-stable to 0"
time="2021-01-12T21:05:53Z" level=info msg="Enqueueing parent of my-namespace/my-app-7f7c78bd49-6-0-stable: Experiment my-namespace/my-app-7f7c78bd49-6-0"
time="2021-01-12T21:05:53Z" level=info msg="Scaled down ReplicaSet my-app-7f7c78bd49-6-0-canary from 1 to 0" experiment=my-app-7f7c78bd49-6-0 namespace=my-namespace
time="2021-01-12T21:05:53Z" level=info msg="Reconciling analysis" analysis=error-rate-analysis experiment=my-app-7f7c78bd49-6-0 namespace=my-namespace
time="2021-01-12T21:05:53Z" level=warning msg="Terminating error-rate-analysis (my-app-7f7c78bd49-6-0-error-rate-analysis)" analysis=error-rate-analysis experiment=my-app-7f7c78bd49-6-0 namespace=my-namespace
time="2021-01-12T21:05:53Z" level=info msg="Event(v1.ObjectReference{Kind:\"Experiment\", Namespace:\"my-namespace\", Name:\"my-app-7f7c78bd49-6-0\", UID:\"4f8e47fa-0942-44e6-a869-df721d6455dd\", APIVersion:\"argoproj.io/v1alpha1\", ResourceVersion:\"441229066\", FieldPath:\"\"}): type: 'Normal' reason: 'ScalingReplicaSet' Scaled down replica set my-app-7f7c78bd49-6-0-canary to 0"
time="2021-01-12T21:05:53Z" level=info msg="Event(v1.ObjectReference{Kind:\"Experiment\", Namespace:\"my-namespace\", Name:\"my-app-7f7c78bd49-6-0\", UID:\"4f8e47fa-0942-44e6-a869-df721d6455dd\", APIVersion:\"argoproj.io/v1alpha1\", ResourceVersion:\"441229066\", FieldPath:\"\"}): type: 'Normal' reason: 'Terminate' Terminating error-rate-analysis (my-app-7f7c78bd49-6-0-error-rate-analysis)"
time="2021-01-12T21:05:53Z" level=info msg="Enqueueing parent of my-namespace/my-app-7f7c78bd49-6-0-canary: Experiment my-namespace/my-app-7f7c78bd49-6-0"
time="2021-01-12T21:05:53Z" level=info msg="Patched: {\"status\":{\"canary\":{\"currentExperiment\":null},\"conditions\":[{\"lastTransitionTime\":\"2021-01-08T06:34:29Z\",\"lastUpdateTime\":\"2021-01-08T06:34:29Z\",\"message\":\"Rollout has minimum availability\",\"reason\":\"AvailableReason\",\"status\":\"True\",\"type\":\"Available\"},{\"lastTransitionTime\":\"2021-01-08T06:24:58Z\",\"lastUpdateTime\":\"2021-01-12T21:05:53Z\",\"message\":\"ReplicaSet \\\"my-app-7f7c78bd49\\\" is progressing.\",\"reason\":\"ReplicaSetUpdated\",\"status\":\"True\",\"type\":\"Progressing\"}],\"currentStepIndex\":1}}" generation=12 namespace=my-namespace resourceVersion=441228021 rollout=my-app
time="2021-01-12T21:05:53Z" level=info msg="Reconciliation completed" namespace=my-namespace rollout=my-app time_ms=59.115897000000004
time="2021-01-12T21:05:53Z" level=info msg="Enqueueing parent of my-namespace/my-app-7f7c78bd49-6-0-stable: Experiment my-namespace/my-app-7f7c78bd49-6-0"
time="2021-01-12T21:05:53Z" level=info msg="Enqueueing parent of my-namespace/my-app-7f7c78bd49-6-0-canary: Experiment my-namespace/my-app-7f7c78bd49-6-0"
time="2021-01-12T21:05:53Z" level=info msg="Started syncing rollout at (2021-01-12 21:05:53.100947956 +0000 UTC m=+1523.695135693)" namespace=my-namespace rollout=my-app
time="2021-01-12T21:05:53Z" level=info msg="Reconciling StableRS" namespace=my-namespace rollout=my-app
time="2021-01-12T21:05:53Z" level=info msg="Reconciling new ReplicaSet 'my-app-7f7c78bd49'" namespace=my-namespace rollout=my-app
time="2021-01-12T21:05:53Z" level=info msg="Patch status successfully" experiment=my-app-7f7c78bd49-6-0 namespace=my-namespace
time="2021-01-12T21:05:53Z" level=info msg="Reconciliation completed" experiment=my-app-7f7c78bd49-6-0 namespace=my-namespace time_ms=68.44009299999999
time="2021-01-12T21:05:53Z" level=info msg="Started syncing Experiment at (2021-01-12 21:05:53.108019348 +0000 UTC m=+1523.702207024)" experiment=my-app-7f7c78bd49-6-0 namespace=my-namespace
time="2021-01-12T21:05:53Z" level=info msg="Claimed ReplicaSet 'my-app-7f7c78bd49-6-0-stable' for template 'stable'" experiment=my-app-7f7c78bd49-6-0 namespace=my-namespace
time="2021-01-12T21:05:53Z" level=info msg="Claimed ReplicaSet 'my-app-7f7c78bd49-6-0-canary' for template 'canary'" experiment=my-app-7f7c78bd49-6-0 namespace=my-namespace
time="2021-01-12T21:05:53Z" level=info msg="Reconciling template" experiment=my-app-7f7c78bd49-6-0 namespace=my-namespace template=stable
time="2021-01-12T21:05:53Z" level=info msg="Template progressed" experiment=my-app-7f7c78bd49-6-0 namespace=my-namespace template=stable
time="2021-01-12T21:05:53Z" level=info msg="Prev status: Current: 1, Updated: 1, Ready: 1, Available: 1" experiment=my-app-7f7c78bd49-6-0 namespace=my-namespace template=stable
time="2021-01-12T21:05:53Z" level=info msg="New status: Current: 0, Updated: 0, Ready: 0, Available: 0" experiment=my-app-7f7c78bd49-6-0 namespace=my-namespace template=stable
time="2021-01-12T21:05:53Z" level=info msg="Reconciling template" experiment=my-app-7f7c78bd49-6-0 namespace=my-namespace template=canary
time="2021-01-12T21:05:53Z" level=info msg="Template progressed" experiment=my-app-7f7c78bd49-6-0 namespace=my-namespace template=canary
time="2021-01-12T21:05:53Z" level=info msg="Prev status: Current: 1, Updated: 1, Ready: 1, Available: 1" experiment=my-app-7f7c78bd49-6-0 namespace=my-namespace template=canary
time="2021-01-12T21:05:53Z" level=info msg="New status: Current: 0, Updated: 0, Ready: 0, Available: 0" experiment=my-app-7f7c78bd49-6-0 namespace=my-namespace template=canary
time="2021-01-12T21:05:53Z" level=info msg="Reconciling analysis" analysis=error-rate-analysis experiment=my-app-7f7c78bd49-6-0 namespace=my-namespace
time="2021-01-12T21:05:53Z" level=info msg="Enqueueing parent of my-namespace/my-app-7f7c78bd49-6-0: Rollout my-namespace/my-app"
time="2021-01-12T21:05:53Z" level=info msg="Not finished reconciling new ReplicaSet 'my-app-7f7c78bd49'" namespace=my-namespace rollout=my-app
time="2021-01-12T21:05:53Z" level=info msg="Not finished reconciling ReplicaSets" namespace=my-namespace rollout=my-app
time="2021-01-12T21:05:53Z" level=info msg="Rollout has executed every step" namespace=my-namespace rollout=my-app
time="2021-01-12T21:05:53Z" level=info msg="Timed out (false) [last progress check: 2021-01-12 21:05:53 +0000 UTC - now: 2021-01-12 21:05:53.119314595 +0000 UTC m=+1523.713502275]" namespace=my-namespace rollout=my-app
time="2021-01-12T21:05:53Z" level=info msg="No status changes. Skipping patch" generation=12 namespace=my-namespace resourceVersion=441229093 rollout=my-app
time="2021-01-12T21:05:53Z" level=info msg="Queueing up rollout for a progress after 599s" namespace=my-namespace rollout=my-app
time="2021-01-12T21:05:53Z" level=info msg="Reconciliation completed" namespace=my-namespace rollout=my-app time_ms=18.510751
time="2021-01-12T21:05:53Z" level=info msg="Started syncing rollout at (2021-01-12 21:05:53.119477767 +0000 UTC m=+1523.713665442)" namespace=my-namespace rollout=my-app
time="2021-01-12T21:05:53Z" level=info msg="Reconciling StableRS" namespace=my-namespace rollout=my-app
time="2021-01-12T21:05:53Z" level=info msg="Reconciling new ReplicaSet 'my-app-7f7c78bd49'" namespace=my-namespace rollout=my-app
time="2021-01-12T21:05:53Z" level=info msg="Event(v1.ObjectReference{Kind:\"Rollout\", Namespace:\"my-namespace\", Name:\"my-app\", UID:\"a98d9975-d8f3-4cb4-996e-3635c3946c46\", APIVersion:\"argoproj.io/v1alpha1\", ResourceVersion:\"441229093\", FieldPath:\"\"}): type: 'Normal' reason: 'ScalingReplicaSet' Scaled up replica set my-app-7f7c78bd49 from 0 to 1"
time="2021-01-12T21:05:53Z" level=info msg="Enqueueing parent of my-namespace/my-app-7f7c78bd49: Rollout my-namespace/my-app"
time="2021-01-12T21:05:53Z" level=info msg="Reconciliation completed" namespace=my-namespace rollout=my-app time_ms=10.930302
time="2021-01-12T21:05:53Z" level=error msg="rollout syncHandler error: Operation cannot be fulfilled on replicasets.apps \"my-app-7f7c78bd49\": the object has been modified; please apply your changes to the latest version and try again" namespace=my-namespace rollout=my-app
E0112 21:05:53.130563 1 controller.go:172] Operation cannot be fulfilled on replicasets.apps "my-app-7f7c78bd49": the object has been modified; please apply your changes to the latest version and try again
time="2021-01-12T21:05:53Z" level=info msg="Started syncing rollout at (2021-01-12 21:05:53.130615777 +0000 UTC m=+1523.724803514)" namespace=my-namespace rollout=my-app
time="2021-01-12T21:05:53Z" level=info msg="Reconciling StableRS" namespace=my-namespace rollout=my-app
time="2021-01-12T21:05:53Z" level=info msg="Reconciling new ReplicaSet 'my-app-7f7c78bd49'" namespace=my-namespace rollout=my-app
time="2021-01-12T21:05:53Z" level=info msg="Reconciling old replica sets" namespace=my-namespace rollout=my-app
time="2021-01-12T21:05:53Z" level=info msg="No Steps remain in the canary steps" namespace=my-namespace rollout=my-app
time="2021-01-12T21:05:53Z" level=info msg="Rollout has executed every step" namespace=my-namespace rollout=my-app
time="2021-01-12T21:05:53Z" level=info msg="Timed out (false) [last progress check: 2021-01-12 21:05:53 +0000 UTC - now: 2021-01-12 21:05:53.143699854 +0000 UTC m=+1523.737887529]" namespace=my-namespace rollout=my-app
time="2021-01-12T21:05:53Z" level=info msg="No status changes. Skipping patch" generation=12 namespace=my-namespace resourceVersion=441229093 rollout=my-app
time="2021-01-12T21:05:53Z" level=info msg="Queueing up rollout for a progress after 599s" namespace=my-namespace rollout=my-app
time="2021-01-12T21:05:53Z" level=info msg="Reconciliation completed" namespace=my-namespace rollout=my-app time_ms=13.441047
time="2021-01-12T21:05:53Z" level=info msg="measurement completed Failed" analysisrun=my-app-7f7c78bd49-6-0-error-rate-analysis metric=error-rate namespace=my-namespace
time="2021-01-12T21:05:53Z" level=info msg="metric assessed Failed: failed (3) > failureLimit (2)" metric=error-rate
time="2021-01-12T21:05:53Z" level=info msg="metric transitioned from Running -> Failed" analysisrun=my-app-7f7c78bd49-6-0-error-rate-analysis metric=error-rate namespace=my-namespace
time="2021-01-12T21:05:53Z" level=info msg="analysis transitioned from Running -> Failed" analysisrun=my-app-7f7c78bd49-6-0-error-rate-analysis namespace=my-namespace
time="2021-01-12T21:05:53Z" level=info msg="Patch status successfully" experiment=my-app-7f7c78bd49-6-0 namespace=my-namespace
time="2021-01-12T21:05:53Z" level=info msg="Reconciliation completed" experiment=my-app-7f7c78bd49-6-0 namespace=my-namespace time_ms=42.859904
time="2021-01-12T21:05:53Z" level=info msg="Started syncing Experiment at (2021-01-12 21:05:53.15089833 +0000 UTC m=+1523.745086005)" experiment=my-app-7f7c78bd49-6-0 namespace=my-namespace
time="2021-01-12T21:05:53Z" level=info msg="Claimed ReplicaSet 'my-app-7f7c78bd49-6-0-stable' for template 'stable'" experiment=my-app-7f7c78bd49-6-0 namespace=my-namespace
time="2021-01-12T21:05:53Z" level=info msg="Claimed ReplicaSet 'my-app-7f7c78bd49-6-0-canary' for template 'canary'" experiment=my-app-7f7c78bd49-6-0 namespace=my-namespace
time="2021-01-12T21:05:53Z" level=info msg="Reconciling template" experiment=my-app-7f7c78bd49-6-0 namespace=my-namespace template=stable
time="2021-01-12T21:05:53Z" level=info msg="Template progressed" experiment=my-app-7f7c78bd49-6-0 namespace=my-namespace template=stable
time="2021-01-12T21:05:53Z" level=info msg="Prev status: Current: 1, Updated: 1, Ready: 1, Available: 1" experiment=my-app-7f7c78bd49-6-0 namespace=my-namespace template=stable
time="2021-01-12T21:05:53Z" level=info msg="New status: Current: 0, Updated: 0, Ready: 0, Available: 0" experiment=my-app-7f7c78bd49-6-0 namespace=my-namespace template=stable
time="2021-01-12T21:05:53Z" level=info msg="Reconciling template" experiment=my-app-7f7c78bd49-6-0 namespace=my-namespace template=canary
time="2021-01-12T21:05:53Z" level=info msg="Template progressed" experiment=my-app-7f7c78bd49-6-0 namespace=my-namespace template=canary
time="2021-01-12T21:05:53Z" level=info msg="Prev status: Current: 1, Updated: 1, Ready: 1, Available: 1" experiment=my-app-7f7c78bd49-6-0 namespace=my-namespace template=canary
time="2021-01-12T21:05:53Z" level=info msg="New status: Current: 0, Updated: 0, Ready: 0, Available: 0" experiment=my-app-7f7c78bd49-6-0 namespace=my-namespace template=canary
time="2021-01-12T21:05:53Z" level=info msg="Reconciling analysis" analysis=error-rate-analysis experiment=my-app-7f7c78bd49-6-0 namespace=my-namespace
time="2021-01-12T21:05:53Z" level=info msg="Enqueueing parent of my-namespace/my-app-7f7c78bd49-6-0: Rollout my-namespace/my-app"
time="2021-01-12T21:05:53Z" level=info msg="Started syncing rollout at (2021-01-12 21:05:53.157146341 +0000 UTC m=+1523.751334021)" namespace=my-namespace rollout=my-app
time="2021-01-12T21:05:53Z" level=info msg="Enqueueing parent of my-namespace/my-app-7f7c78bd49: Rollout my-namespace/my-app"
time="2021-01-12T21:05:53Z" level=info msg="Event(v1.ObjectReference{Kind:\"AnalysisRun\", Namespace:\"my-namespace\", Name:\"my-app-7f7c78bd49-6-0-error-rate-analysis\", UID:\"7a7ad7e7-c73e-4616-b692-13360bc5c492\", APIVersion:\"argoproj.io/v1alpha1\", ResourceVersion:\"441228962\", FieldPath:\"\"}): type: 'Warning' reason: 'Failed' metric 'error-rate' completed Failed"
time="2021-01-12T21:05:53Z" level=info msg="Event(v1.ObjectReference{Kind:\"AnalysisRun\", Namespace:\"my-namespace\", Name:\"my-app-7f7c78bd49-6-0-error-rate-analysis\", UID:\"7a7ad7e7-c73e-4616-b692-13360bc5c492\", APIVersion:\"argoproj.io/v1alpha1\", ResourceVersion:\"441228962\", FieldPath:\"\"}): type: 'Warning' reason: 'Failed' analysis completed Failed"
time="2021-01-12T21:05:53Z" level=info msg="Reconciling StableRS" namespace=my-namespace rollout=my-app
time="2021-01-12T21:05:53Z" level=info msg="Reconciling new ReplicaSet 'my-app-7f7c78bd49'" namespace=my-namespace rollout=my-app
time="2021-01-12T21:05:53Z" level=info msg="Reconciling old replica sets" namespace=my-namespace rollout=my-app
time="2021-01-12T21:05:53Z" level=info msg="No Steps remain in the canary steps" namespace=my-namespace rollout=my-app
time="2021-01-12T21:05:53Z" level=info msg="Rollout has executed every step" namespace=my-namespace rollout=my-app
time="2021-01-12T21:05:53Z" level=info msg="Timed out (false) [last progress check: 2021-01-12 21:05:53 +0000 UTC - now: 2021-01-12 21:05:53.174274991 +0000 UTC m=+1523.768462665]" namespace=my-namespace rollout=my-app
time="2021-01-12T21:05:53Z" level=info msg="No status changes. Skipping patch" generation=12 namespace=my-namespace resourceVersion=441229093 rollout=my-app
time="2021-01-12T21:05:53Z" level=info msg="Queueing up rollout for a progress after 599s" namespace=my-namespace rollout=my-app
time="2021-01-12T21:05:53Z" level=info msg="Reconciliation completed" namespace=my-namespace rollout=my-app time_ms=17.317052999999998
# Logs for a specific rollout:
time="2021-01-12T21:05:53Z" level=info msg="Started syncing rollout at (2021-01-12 21:05:53.032834185 +0000 UTC m=+1523.627021859)" namespace=my-namespace rollout=my-app
time="2021-01-12T21:05:53Z" level=info msg="Reconciling experiment step (stepIndex: 0)" namespace=my-namespace rollout=my-app
time="2021-01-12T21:05:53Z" level=info msg="Reconciling StableRS" namespace=my-namespace rollout=my-app
time="2021-01-12T21:05:53Z" level=info msg="Reconciling new ReplicaSet 'my-app-7f7c78bd49'" namespace=my-namespace rollout=my-app
time="2021-01-12T21:05:53Z" level=info msg="Reconciling old replica sets" namespace=my-namespace rollout=my-app
time="2021-01-12T21:05:53Z" level=info msg="Incrementing the Current Step Index to 1" namespace=my-namespace rollout=my-app
time="2021-01-12T21:05:53Z" level=info msg="Patched: {\"status\":{\"canary\":{\"currentExperiment\":null},\"conditions\":[{\"lastTransitionTime\":\"2021-01-08T06:34:29Z\",\"lastUpdateTime\":\"2021-01-08T06:34:29Z\",\"message\":\"Rollout has minimum availability\",\"reason\":\"AvailableReason\",\"status\":\"True\",\"type\":\"Available\"},{\"lastTransitionTime\":\"2021-01-08T06:24:58Z\",\"lastUpdateTime\":\"2021-01-12T21:05:53Z\",\"message\":\"ReplicaSet \\\"my-app-7f7c78bd49\\\" is progressing.\",\"reason\":\"ReplicaSetUpdated\",\"status\":\"True\",\"type\":\"Progressing\"}],\"currentStepIndex\":1}}" generation=12 namespace=my-namespace resourceVersion=441228021 rollout=my-app
time="2021-01-12T21:05:53Z" level=info msg="Reconciliation completed" namespace=my-namespace rollout=my-app time_ms=59.115897000000004
time="2021-01-12T21:05:53Z" level=info msg="Started syncing rollout at (2021-01-12 21:05:53.100947956 +0000 UTC m=+1523.695135693)" namespace=my-namespace rollout=my-app
time="2021-01-12T21:05:53Z" level=info msg="Reconciling StableRS" namespace=my-namespace rollout=my-app
time="2021-01-12T21:05:53Z" level=info msg="Reconciling new ReplicaSet 'my-app-7f7c78bd49'" namespace=my-namespace rollout=my-app
time="2021-01-12T21:05:53Z" level=info msg="Not finished reconciling new ReplicaSet 'my-app-7f7c78bd49'" namespace=my-namespace rollout=my-app
time="2021-01-12T21:05:53Z" level=info msg="Not finished reconciling ReplicaSets" namespace=my-namespace rollout=my-app
time="2021-01-12T21:05:53Z" level=info msg="Rollout has executed every step" namespace=my-namespace rollout=my-app
time="2021-01-12T21:05:53Z" level=info msg="Timed out (false) [last progress check: 2021-01-12 21:05:53 +0000 UTC - now: 2021-01-12 21:05:53.119314595 +0000 UTC m=+1523.713502275]" namespace=my-namespace rollout=my-app
time="2021-01-12T21:05:53Z" level=info msg="No status changes. Skipping patch" generation=12 namespace=my-namespace resourceVersion=441229093 rollout=my-app
time="2021-01-12T21:05:53Z" level=info msg="Queueing up rollout for a progress after 599s" namespace=my-namespace rollout=my-app
time="2021-01-12T21:05:53Z" level=info msg="Reconciliation completed" namespace=my-namespace rollout=my-app time_ms=18.510751
time="2021-01-12T21:05:53Z" level=info msg="Started syncing rollout at (2021-01-12 21:05:53.119477767 +0000 UTC m=+1523.713665442)" namespace=my-namespace rollout=my-app
time="2021-01-12T21:05:53Z" level=info msg="Reconciling StableRS" namespace=my-namespace rollout=my-app
time="2021-01-12T21:05:53Z" level=info msg="Reconciling new ReplicaSet 'my-app-7f7c78bd49'" namespace=my-namespace rollout=my-app
time="2021-01-12T21:05:53Z" level=info msg="Reconciliation completed" namespace=my-namespace rollout=my-app time_ms=10.930302
time="2021-01-12T21:05:53Z" level=error msg="rollout syncHandler error: Operation cannot be fulfilled on replicasets.apps \"my-app-7f7c78bd49\": the object has been modified; please apply your changes to the latest version and try again" namespace=my-namespace rollout=my-app
time="2021-01-12T21:05:53Z" level=info msg="Started syncing rollout at (2021-01-12 21:05:53.130615777 +0000 UTC m=+1523.724803514)" namespace=my-namespace rollout=my-app
time="2021-01-12T21:05:53Z" level=info msg="Reconciling StableRS" namespace=my-namespace rollout=my-app
time="2021-01-12T21:05:53Z" level=info msg="Reconciling new ReplicaSet 'my-app-7f7c78bd49'" namespace=my-namespace rollout=my-app
time="2021-01-12T21:05:53Z" level=info msg="Reconciling old replica sets" namespace=my-namespace rollout=my-app
time="2021-01-12T21:05:53Z" level=info msg="No Steps remain in the canary steps" namespace=my-namespace rollout=my-app
time="2021-01-12T21:05:53Z" level=info msg="Rollout has executed every step" namespace=my-namespace rollout=my-app
time="2021-01-12T21:05:53Z" level=info msg="Timed out (false) [last progress check: 2021-01-12 21:05:53 +0000 UTC - now: 2021-01-12 21:05:53.143699854 +0000 UTC m=+1523.737887529]" namespace=my-namespace rollout=my-app
time="2021-01-12T21:05:53Z" level=info msg="No status changes. Skipping patch" generation=12 namespace=my-namespace resourceVersion=441229093 rollout=my-app
time="2021-01-12T21:05:53Z" level=info msg="Queueing up rollout for a progress after 599s" namespace=my-namespace rollout=my-app
time="2021-01-12T21:05:53Z" level=info msg="Reconciliation completed" namespace=my-namespace rollout=my-app time_ms=13.441047
time="2021-01-12T21:05:53Z" level=info msg="Started syncing rollout at (2021-01-12 21:05:53.157146341 +0000 UTC m=+1523.751334021)" namespace=my-namespace rollout=my-app
time="2021-01-12T21:05:53Z" level=info msg="Reconciling StableRS" namespace=my-namespace rollout=my-app
time="2021-01-12T21:05:53Z" level=info msg="Reconciling new ReplicaSet 'my-app-7f7c78bd49'" namespace=my-namespace rollout=my-app
time="2021-01-12T21:05:53Z" level=info msg="Reconciling old replica sets" namespace=my-namespace rollout=my-app
time="2021-01-12T21:05:53Z" level=info msg="No Steps remain in the canary steps" namespace=my-namespace rollout=my-app
time="2021-01-12T21:05:53Z" level=info msg="Rollout has executed every step" namespace=my-namespace rollout=my-app
time="2021-01-12T21:05:53Z" level=info msg="Timed out (false) [last progress check: 2021-01-12 21:05:53 +0000 UTC - now: 2021-01-12 21:05:53.174274991 +0000 UTC m=+1523.768462665]" namespace=my-namespace rollout=my-app
time="2021-01-12T21:05:53Z" level=info msg="No status changes. Skipping patch" generation=12 namespace=my-namespace resourceVersion=441229093 rollout=my-app
time="2021-01-12T21:05:53Z" level=info msg="Queueing up rollout for a progress after 599s" namespace=my-namespace rollout=my-app
time="2021-01-12T21:05:53Z" level=info msg="Reconciliation completed" namespace=my-namespace rollout=my-app time_ms=17.317052999999998
Summary
We have a Rollout that is configured to stand up both a cold stable/control pod along with a cold canary pod as an Experiment that runs for a 90 second duration. During the experiment, we run an analysis template that gives the stable and canary pods some time to send metrics off to DataDog (initialDelay of 50 seconds), and then checks the difference of the errors between the two experiment pods on a 10 second interval. This AnalysisTemplate tolerates 2 failures of our query.
During a Rollout, we observed Argo-Rollouts correctly:
failureLimit
of 2, the AnalysisRun was correctly labeled as "Failed".However, the overall status of the Experiment was incorrectly labeled as Successful:
Based on the logs below, it looks like the Experiment was marked as Successful just before the third AnalysisRun failure occurred, marking the overall AnalysisRun as a failure and resulting in the contradiction above. I would expect the Experiment to allow an in-progress AnalysisRun to complete so the Experiment takes this into account.
Diagnostics
What version of Argo Rollouts are you running?
v0.10.2
Logs
Config
Our AnalysisTemplate looks like this:
And our Rollout steps look like the following:
Message from the maintainers:
Impacted by this bug? Give it a 👍. We prioritize the issues with the most 👍.