argoproj / argo-rollouts

Progressive Delivery for Kubernetes
https://argo-rollouts.readthedocs.io/
Apache License 2.0
2.76k stars 866 forks source link

Experiment marked as Successful while an AnalysisRun that eventually fails is still running #943

Open kcboschert opened 3 years ago

kcboschert commented 3 years ago

Summary

We have a Rollout that is configured to stand up both a cold stable/control pod along with a cold canary pod as an Experiment that runs for a 90 second duration. During the experiment, we run an analysis template that gives the stable and canary pods some time to send metrics off to DataDog (initialDelay of 50 seconds), and then checks the difference of the errors between the two experiment pods on a 10 second interval. This AnalysisTemplate tolerates 2 failures of our query.

During a Rollout, we observed Argo-Rollouts correctly:

However, the overall status of the Experiment was incorrectly labeled as Successful:

├──# revision:6
│  ├──⧉ my-app-7f7c78bd49                                  ReplicaSet   • ScaledDown  21h
│  └──Σ my-app-7f7c78bd49-6-0                              Experiment   ✔ Successful  21h
│     ├──⧉ my-app-7f7c78bd49-6-0-canary                    ReplicaSet   • ScaledDown  21h
│     ├──⧉ my-app-7f7c78bd49-6-0-stable                    ReplicaSet   • ScaledDown  21h
│     └──α my-app-7f7c78bd49-6-0-error-rate-analysis       AnalysisRun  ✖ Failed      21h  ✔ 2,✖ 3

Based on the logs below, it looks like the Experiment was marked as Successful just before the third AnalysisRun failure occurred, marking the overall AnalysisRun as a failure and resulting in the contradiction above. I would expect the Experiment to allow an in-progress AnalysisRun to complete so the Experiment takes this into account.

Diagnostics

What version of Argo Rollouts are you running? v0.10.2

Logs

# Logs for the entire controller:
time="2021-01-12T21:05:43Z" level=info msg="Started syncing Analysis at (2021-01-12 21:05:43.000220253 +0000 UTC m=+1513.594407927)" analysisrun=my-app-7f7c78bd49-6-0-error-rate-analysis namespace=my-namespace
time="2021-01-12T21:05:43Z" level=info msg="running overdue measurement" analysisrun=my-app-7f7c78bd49-6-0-error-rate-analysis metric=error-rate namespace=my-namespace
time="2021-01-12T21:05:43Z" level=info msg="taking 1 measurements" analysisrun=my-app-7f7c78bd49-6-0-error-rate-analysis namespace=my-namespace
time="2021-01-12T21:05:43Z" level=info msg="measurement completed Failed" analysisrun=my-app-7f7c78bd49-6-0-error-rate-analysis metric=error-rate namespace=my-namespace
time="2021-01-12T21:05:43Z" level=info msg="enqueueing analysis after 9.999898914s" analysisrun=my-app-7f7c78bd49-6-0-error-rate-analysis namespace=my-namespace
time="2021-01-12T21:05:43Z" level=info msg="Patch status successfully" analysisrun=my-app-7f7c78bd49-6-0-error-rate-analysis namespace=my-namespace
time="2021-01-12T21:05:43Z" level=info msg="Reconciliation completed" analysisrun=my-app-7f7c78bd49-6-0-error-rate-analysis namespace=my-namespace time_ms=173.358142
time="2021-01-12T21:05:43Z" level=info msg="Started syncing Analysis at (2021-01-12 21:05:43.17360435 +0000 UTC m=+1513.767792029)" analysisrun=my-app-7f7c78bd49-6-0-error-rate-analysis namespace=my-namespace
time="2021-01-12T21:05:43Z" level=info msg="taking 0 measurements" analysisrun=my-app-7f7c78bd49-6-0-error-rate-analysis namespace=my-namespace
time="2021-01-12T21:05:43Z" level=info msg="enqueueing analysis after 9.826259202s" analysisrun=my-app-7f7c78bd49-6-0-error-rate-analysis namespace=my-namespace
time="2021-01-12T21:05:43Z" level=info msg="No status changes. Skipping patch" analysisrun=my-app-7f7c78bd49-6-0-error-rate-analysis namespace=my-namespace
time="2021-01-12T21:05:43Z" level=info msg="Reconciliation completed" analysisrun=my-app-7f7c78bd49-6-0-error-rate-analysis namespace=my-namespace time_ms=0.276776
time="2021-01-12T21:05:53Z" level=info msg="Started syncing Experiment at (2021-01-12 21:05:53.000199831 +0000 UTC m=+1523.594387627)" experiment=my-app-7f7c78bd49-6-0 namespace=my-namespace
time="2021-01-12T21:05:53Z" level=info msg="Claimed ReplicaSet 'my-app-7f7c78bd49-6-0-stable' for template 'stable'" experiment=my-app-7f7c78bd49-6-0 namespace=my-namespace
time="2021-01-12T21:05:53Z" level=info msg="Claimed ReplicaSet 'my-app-7f7c78bd49-6-0-canary' for template 'canary'" experiment=my-app-7f7c78bd49-6-0 namespace=my-namespace
time="2021-01-12T21:05:53Z" level=info msg="Reconciling template" experiment=my-app-7f7c78bd49-6-0 namespace=my-namespace template=stable
time="2021-01-12T21:05:53Z" level=info msg="Template 'stable' transitioned from Running -> Successful" experiment=my-app-7f7c78bd49-6-0 namespace=my-namespace
time="2021-01-12T21:05:53Z" level=info msg="Reconciling template" experiment=my-app-7f7c78bd49-6-0 namespace=my-namespace template=canary
time="2021-01-12T21:05:53Z" level=info msg="Template 'canary' transitioned from Running -> Successful" experiment=my-app-7f7c78bd49-6-0 namespace=my-namespace
time="2021-01-12T21:05:53Z" level=info msg="Reconciling analysis" analysis=error-rate-analysis experiment=my-app-7f7c78bd49-6-0 namespace=my-namespace
time="2021-01-12T21:05:53Z" level=info msg="Experiment transitioned from Running -> Successful" experiment=my-app-7f7c78bd49-6-0 namespace=my-namespace
time="2021-01-12T21:05:53Z" level=info msg="Started syncing Analysis at (2021-01-12 21:05:53.001600105 +0000 UTC m=+1523.595787871)" analysisrun=my-app-7f7c78bd49-6-0-error-rate-analysis namespace=my-namespace
time="2021-01-12T21:05:53Z" level=info msg="running overdue measurement" analysisrun=my-app-7f7c78bd49-6-0-error-rate-analysis metric=error-rate namespace=my-namespace
time="2021-01-12T21:05:53Z" level=info msg="taking 1 measurements" analysisrun=my-app-7f7c78bd49-6-0-error-rate-analysis namespace=my-namespace
time="2021-01-12T21:05:53Z" level=info msg="Event(v1.ObjectReference{Kind:\"Experiment\", Namespace:\"my-namespace\", Name:\"my-app-7f7c78bd49-6-0\", UID:\"4f8e47fa-0942-44e6-a869-df721d6455dd\", APIVersion:\"argoproj.io/v1alpha1\", ResourceVersion:\"441228358\", FieldPath:\"\"}): type: 'Normal' reason: 'Successful' Template 'stable' transitioned from Running -> Successful"
time="2021-01-12T21:05:53Z" level=info msg="Event(v1.ObjectReference{Kind:\"Experiment\", Namespace:\"my-namespace\", Name:\"my-app-7f7c78bd49-6-0\", UID:\"4f8e47fa-0942-44e6-a869-df721d6455dd\", APIVersion:\"argoproj.io/v1alpha1\", ResourceVersion:\"441228358\", FieldPath:\"\"}): type: 'Normal' reason: 'Successful' Template 'canary' transitioned from Running -> Successful"
time="2021-01-12T21:05:53Z" level=info msg="Event(v1.ObjectReference{Kind:\"Experiment\", Namespace:\"my-namespace\", Name:\"my-app-7f7c78bd49-6-0\", UID:\"4f8e47fa-0942-44e6-a869-df721d6455dd\", APIVersion:\"argoproj.io/v1alpha1\", ResourceVersion:\"441228358\", FieldPath:\"\"}): type: 'Normal' reason: 'Successful' Experiment transitioned from Running -> Successful"
time="2021-01-12T21:05:53Z" level=info msg="Patch status successfully" experiment=my-app-7f7c78bd49-6-0 namespace=my-namespace
time="2021-01-12T21:05:53Z" level=info msg="Reconciliation completed" experiment=my-app-7f7c78bd49-6-0 namespace=my-namespace time_ms=30.785740999999998
time="2021-01-12T21:05:53Z" level=info msg="Enqueueing parent of my-namespace/my-app-7f7c78bd49-6-0: Rollout my-namespace/my-app"
time="2021-01-12T21:05:53Z" level=info msg="Started syncing rollout at (2021-01-12 21:05:53.032834185 +0000 UTC m=+1523.627021859)" namespace=my-namespace rollout=my-app
time="2021-01-12T21:05:53Z" level=info msg="Reconciling experiment step (stepIndex: 0)" namespace=my-namespace rollout=my-app
time="2021-01-12T21:05:53Z" level=info msg="Reconciling StableRS" namespace=my-namespace rollout=my-app
time="2021-01-12T21:05:53Z" level=info msg="Reconciling new ReplicaSet 'my-app-7f7c78bd49'" namespace=my-namespace rollout=my-app
time="2021-01-12T21:05:53Z" level=info msg="Reconciling old replica sets" namespace=my-namespace rollout=my-app
time="2021-01-12T21:05:53Z" level=info msg="Incrementing the Current Step Index to 1" namespace=my-namespace rollout=my-app
time="2021-01-12T21:05:53Z" level=info msg="Started syncing Experiment at (2021-01-12 21:05:53.039557395 +0000 UTC m=+1523.633745246)" experiment=my-app-7f7c78bd49-6-0 namespace=my-namespace
time="2021-01-12T21:05:53Z" level=info msg="Claimed ReplicaSet 'my-app-7f7c78bd49-6-0-stable' for template 'stable'" experiment=my-app-7f7c78bd49-6-0 namespace=my-namespace
time="2021-01-12T21:05:53Z" level=info msg="Claimed ReplicaSet 'my-app-7f7c78bd49-6-0-canary' for template 'canary'" experiment=my-app-7f7c78bd49-6-0 namespace=my-namespace
time="2021-01-12T21:05:53Z" level=info msg="Reconciling template" experiment=my-app-7f7c78bd49-6-0 namespace=my-namespace template=stable
time="2021-01-12T21:05:53Z" level=info msg="Event(v1.ObjectReference{Kind:\"Rollout\", Namespace:\"my-namespace\", Name:\"my-app\", UID:\"a98d9975-d8f3-4cb4-996e-3635c3946c46\", APIVersion:\"argoproj.io/v1alpha1\", ResourceVersion:\"441228021\", FieldPath:\"\"}): type: 'Normal' reason: 'SettingStableRS' Completed all steps"
time="2021-01-12T21:05:53Z" level=info msg="Event(v1.ObjectReference{Kind:\"Rollout\", Namespace:\"my-namespace\", Name:\"my-app\", UID:\"a98d9975-d8f3-4cb4-996e-3635c3946c46\", APIVersion:\"argoproj.io/v1alpha1\", ResourceVersion:\"441228021\", FieldPath:\"\"}): type: 'Normal' reason: 'SetStepIndex' Set Step Index to 1"
time="2021-01-12T21:05:53Z" level=info msg="Scaled down ReplicaSet my-app-7f7c78bd49-6-0-stable from 1 to 0" experiment=my-app-7f7c78bd49-6-0 namespace=my-namespace
time="2021-01-12T21:05:53Z" level=info msg="Reconciling template" experiment=my-app-7f7c78bd49-6-0 namespace=my-namespace template=canary
time="2021-01-12T21:05:53Z" level=info msg="Event(v1.ObjectReference{Kind:\"Experiment\", Namespace:\"my-namespace\", Name:\"my-app-7f7c78bd49-6-0\", UID:\"4f8e47fa-0942-44e6-a869-df721d6455dd\", APIVersion:\"argoproj.io/v1alpha1\", ResourceVersion:\"441229066\", FieldPath:\"\"}): type: 'Normal' reason: 'ScalingReplicaSet' Scaled down replica set my-app-7f7c78bd49-6-0-stable to 0"
time="2021-01-12T21:05:53Z" level=info msg="Enqueueing parent of my-namespace/my-app-7f7c78bd49-6-0-stable: Experiment my-namespace/my-app-7f7c78bd49-6-0"
time="2021-01-12T21:05:53Z" level=info msg="Scaled down ReplicaSet my-app-7f7c78bd49-6-0-canary from 1 to 0" experiment=my-app-7f7c78bd49-6-0 namespace=my-namespace
time="2021-01-12T21:05:53Z" level=info msg="Reconciling analysis" analysis=error-rate-analysis experiment=my-app-7f7c78bd49-6-0 namespace=my-namespace
time="2021-01-12T21:05:53Z" level=warning msg="Terminating error-rate-analysis (my-app-7f7c78bd49-6-0-error-rate-analysis)" analysis=error-rate-analysis experiment=my-app-7f7c78bd49-6-0 namespace=my-namespace
time="2021-01-12T21:05:53Z" level=info msg="Event(v1.ObjectReference{Kind:\"Experiment\", Namespace:\"my-namespace\", Name:\"my-app-7f7c78bd49-6-0\", UID:\"4f8e47fa-0942-44e6-a869-df721d6455dd\", APIVersion:\"argoproj.io/v1alpha1\", ResourceVersion:\"441229066\", FieldPath:\"\"}): type: 'Normal' reason: 'ScalingReplicaSet' Scaled down replica set my-app-7f7c78bd49-6-0-canary to 0"
time="2021-01-12T21:05:53Z" level=info msg="Event(v1.ObjectReference{Kind:\"Experiment\", Namespace:\"my-namespace\", Name:\"my-app-7f7c78bd49-6-0\", UID:\"4f8e47fa-0942-44e6-a869-df721d6455dd\", APIVersion:\"argoproj.io/v1alpha1\", ResourceVersion:\"441229066\", FieldPath:\"\"}): type: 'Normal' reason: 'Terminate' Terminating error-rate-analysis (my-app-7f7c78bd49-6-0-error-rate-analysis)"
time="2021-01-12T21:05:53Z" level=info msg="Enqueueing parent of my-namespace/my-app-7f7c78bd49-6-0-canary: Experiment my-namespace/my-app-7f7c78bd49-6-0"
time="2021-01-12T21:05:53Z" level=info msg="Patched: {\"status\":{\"canary\":{\"currentExperiment\":null},\"conditions\":[{\"lastTransitionTime\":\"2021-01-08T06:34:29Z\",\"lastUpdateTime\":\"2021-01-08T06:34:29Z\",\"message\":\"Rollout has minimum availability\",\"reason\":\"AvailableReason\",\"status\":\"True\",\"type\":\"Available\"},{\"lastTransitionTime\":\"2021-01-08T06:24:58Z\",\"lastUpdateTime\":\"2021-01-12T21:05:53Z\",\"message\":\"ReplicaSet \\\"my-app-7f7c78bd49\\\" is progressing.\",\"reason\":\"ReplicaSetUpdated\",\"status\":\"True\",\"type\":\"Progressing\"}],\"currentStepIndex\":1}}" generation=12 namespace=my-namespace resourceVersion=441228021 rollout=my-app
time="2021-01-12T21:05:53Z" level=info msg="Reconciliation completed" namespace=my-namespace rollout=my-app time_ms=59.115897000000004
time="2021-01-12T21:05:53Z" level=info msg="Enqueueing parent of my-namespace/my-app-7f7c78bd49-6-0-stable: Experiment my-namespace/my-app-7f7c78bd49-6-0"
time="2021-01-12T21:05:53Z" level=info msg="Enqueueing parent of my-namespace/my-app-7f7c78bd49-6-0-canary: Experiment my-namespace/my-app-7f7c78bd49-6-0"
time="2021-01-12T21:05:53Z" level=info msg="Started syncing rollout at (2021-01-12 21:05:53.100947956 +0000 UTC m=+1523.695135693)" namespace=my-namespace rollout=my-app
time="2021-01-12T21:05:53Z" level=info msg="Reconciling StableRS" namespace=my-namespace rollout=my-app
time="2021-01-12T21:05:53Z" level=info msg="Reconciling new ReplicaSet 'my-app-7f7c78bd49'" namespace=my-namespace rollout=my-app
time="2021-01-12T21:05:53Z" level=info msg="Patch status successfully" experiment=my-app-7f7c78bd49-6-0 namespace=my-namespace
time="2021-01-12T21:05:53Z" level=info msg="Reconciliation completed" experiment=my-app-7f7c78bd49-6-0 namespace=my-namespace time_ms=68.44009299999999
time="2021-01-12T21:05:53Z" level=info msg="Started syncing Experiment at (2021-01-12 21:05:53.108019348 +0000 UTC m=+1523.702207024)" experiment=my-app-7f7c78bd49-6-0 namespace=my-namespace
time="2021-01-12T21:05:53Z" level=info msg="Claimed ReplicaSet 'my-app-7f7c78bd49-6-0-stable' for template 'stable'" experiment=my-app-7f7c78bd49-6-0 namespace=my-namespace
time="2021-01-12T21:05:53Z" level=info msg="Claimed ReplicaSet 'my-app-7f7c78bd49-6-0-canary' for template 'canary'" experiment=my-app-7f7c78bd49-6-0 namespace=my-namespace
time="2021-01-12T21:05:53Z" level=info msg="Reconciling template" experiment=my-app-7f7c78bd49-6-0 namespace=my-namespace template=stable
time="2021-01-12T21:05:53Z" level=info msg="Template progressed" experiment=my-app-7f7c78bd49-6-0 namespace=my-namespace template=stable
time="2021-01-12T21:05:53Z" level=info msg="Prev status: Current: 1, Updated: 1, Ready: 1, Available: 1" experiment=my-app-7f7c78bd49-6-0 namespace=my-namespace template=stable
time="2021-01-12T21:05:53Z" level=info msg="New status: Current: 0, Updated: 0, Ready: 0, Available: 0" experiment=my-app-7f7c78bd49-6-0 namespace=my-namespace template=stable
time="2021-01-12T21:05:53Z" level=info msg="Reconciling template" experiment=my-app-7f7c78bd49-6-0 namespace=my-namespace template=canary
time="2021-01-12T21:05:53Z" level=info msg="Template progressed" experiment=my-app-7f7c78bd49-6-0 namespace=my-namespace template=canary
time="2021-01-12T21:05:53Z" level=info msg="Prev status: Current: 1, Updated: 1, Ready: 1, Available: 1" experiment=my-app-7f7c78bd49-6-0 namespace=my-namespace template=canary
time="2021-01-12T21:05:53Z" level=info msg="New status: Current: 0, Updated: 0, Ready: 0, Available: 0" experiment=my-app-7f7c78bd49-6-0 namespace=my-namespace template=canary
time="2021-01-12T21:05:53Z" level=info msg="Reconciling analysis" analysis=error-rate-analysis experiment=my-app-7f7c78bd49-6-0 namespace=my-namespace
time="2021-01-12T21:05:53Z" level=info msg="Enqueueing parent of my-namespace/my-app-7f7c78bd49-6-0: Rollout my-namespace/my-app"
time="2021-01-12T21:05:53Z" level=info msg="Not finished reconciling new ReplicaSet 'my-app-7f7c78bd49'" namespace=my-namespace rollout=my-app
time="2021-01-12T21:05:53Z" level=info msg="Not finished reconciling ReplicaSets" namespace=my-namespace rollout=my-app
time="2021-01-12T21:05:53Z" level=info msg="Rollout has executed every step" namespace=my-namespace rollout=my-app
time="2021-01-12T21:05:53Z" level=info msg="Timed out (false) [last progress check: 2021-01-12 21:05:53 +0000 UTC - now: 2021-01-12 21:05:53.119314595 +0000 UTC m=+1523.713502275]" namespace=my-namespace rollout=my-app
time="2021-01-12T21:05:53Z" level=info msg="No status changes. Skipping patch" generation=12 namespace=my-namespace resourceVersion=441229093 rollout=my-app
time="2021-01-12T21:05:53Z" level=info msg="Queueing up rollout for a progress after 599s" namespace=my-namespace rollout=my-app
time="2021-01-12T21:05:53Z" level=info msg="Reconciliation completed" namespace=my-namespace rollout=my-app time_ms=18.510751
time="2021-01-12T21:05:53Z" level=info msg="Started syncing rollout at (2021-01-12 21:05:53.119477767 +0000 UTC m=+1523.713665442)" namespace=my-namespace rollout=my-app
time="2021-01-12T21:05:53Z" level=info msg="Reconciling StableRS" namespace=my-namespace rollout=my-app
time="2021-01-12T21:05:53Z" level=info msg="Reconciling new ReplicaSet 'my-app-7f7c78bd49'" namespace=my-namespace rollout=my-app
time="2021-01-12T21:05:53Z" level=info msg="Event(v1.ObjectReference{Kind:\"Rollout\", Namespace:\"my-namespace\", Name:\"my-app\", UID:\"a98d9975-d8f3-4cb4-996e-3635c3946c46\", APIVersion:\"argoproj.io/v1alpha1\", ResourceVersion:\"441229093\", FieldPath:\"\"}): type: 'Normal' reason: 'ScalingReplicaSet' Scaled up replica set my-app-7f7c78bd49 from 0 to 1"
time="2021-01-12T21:05:53Z" level=info msg="Enqueueing parent of my-namespace/my-app-7f7c78bd49: Rollout my-namespace/my-app"
time="2021-01-12T21:05:53Z" level=info msg="Reconciliation completed" namespace=my-namespace rollout=my-app time_ms=10.930302
time="2021-01-12T21:05:53Z" level=error msg="rollout syncHandler error: Operation cannot be fulfilled on replicasets.apps \"my-app-7f7c78bd49\": the object has been modified; please apply your changes to the latest version and try again" namespace=my-namespace rollout=my-app
E0112 21:05:53.130563       1 controller.go:172] Operation cannot be fulfilled on replicasets.apps "my-app-7f7c78bd49": the object has been modified; please apply your changes to the latest version and try again
time="2021-01-12T21:05:53Z" level=info msg="Started syncing rollout at (2021-01-12 21:05:53.130615777 +0000 UTC m=+1523.724803514)" namespace=my-namespace rollout=my-app
time="2021-01-12T21:05:53Z" level=info msg="Reconciling StableRS" namespace=my-namespace rollout=my-app
time="2021-01-12T21:05:53Z" level=info msg="Reconciling new ReplicaSet 'my-app-7f7c78bd49'" namespace=my-namespace rollout=my-app
time="2021-01-12T21:05:53Z" level=info msg="Reconciling old replica sets" namespace=my-namespace rollout=my-app
time="2021-01-12T21:05:53Z" level=info msg="No Steps remain in the canary steps" namespace=my-namespace rollout=my-app
time="2021-01-12T21:05:53Z" level=info msg="Rollout has executed every step" namespace=my-namespace rollout=my-app
time="2021-01-12T21:05:53Z" level=info msg="Timed out (false) [last progress check: 2021-01-12 21:05:53 +0000 UTC - now: 2021-01-12 21:05:53.143699854 +0000 UTC m=+1523.737887529]" namespace=my-namespace rollout=my-app
time="2021-01-12T21:05:53Z" level=info msg="No status changes. Skipping patch" generation=12 namespace=my-namespace resourceVersion=441229093 rollout=my-app
time="2021-01-12T21:05:53Z" level=info msg="Queueing up rollout for a progress after 599s" namespace=my-namespace rollout=my-app
time="2021-01-12T21:05:53Z" level=info msg="Reconciliation completed" namespace=my-namespace rollout=my-app time_ms=13.441047
time="2021-01-12T21:05:53Z" level=info msg="measurement completed Failed" analysisrun=my-app-7f7c78bd49-6-0-error-rate-analysis metric=error-rate namespace=my-namespace
time="2021-01-12T21:05:53Z" level=info msg="metric assessed Failed: failed (3) > failureLimit (2)" metric=error-rate
time="2021-01-12T21:05:53Z" level=info msg="metric transitioned from Running -> Failed" analysisrun=my-app-7f7c78bd49-6-0-error-rate-analysis metric=error-rate namespace=my-namespace
time="2021-01-12T21:05:53Z" level=info msg="analysis transitioned from Running -> Failed" analysisrun=my-app-7f7c78bd49-6-0-error-rate-analysis namespace=my-namespace
time="2021-01-12T21:05:53Z" level=info msg="Patch status successfully" experiment=my-app-7f7c78bd49-6-0 namespace=my-namespace
time="2021-01-12T21:05:53Z" level=info msg="Reconciliation completed" experiment=my-app-7f7c78bd49-6-0 namespace=my-namespace time_ms=42.859904
time="2021-01-12T21:05:53Z" level=info msg="Started syncing Experiment at (2021-01-12 21:05:53.15089833 +0000 UTC m=+1523.745086005)" experiment=my-app-7f7c78bd49-6-0 namespace=my-namespace
time="2021-01-12T21:05:53Z" level=info msg="Claimed ReplicaSet 'my-app-7f7c78bd49-6-0-stable' for template 'stable'" experiment=my-app-7f7c78bd49-6-0 namespace=my-namespace
time="2021-01-12T21:05:53Z" level=info msg="Claimed ReplicaSet 'my-app-7f7c78bd49-6-0-canary' for template 'canary'" experiment=my-app-7f7c78bd49-6-0 namespace=my-namespace
time="2021-01-12T21:05:53Z" level=info msg="Reconciling template" experiment=my-app-7f7c78bd49-6-0 namespace=my-namespace template=stable
time="2021-01-12T21:05:53Z" level=info msg="Template progressed" experiment=my-app-7f7c78bd49-6-0 namespace=my-namespace template=stable
time="2021-01-12T21:05:53Z" level=info msg="Prev status: Current: 1, Updated: 1, Ready: 1, Available: 1" experiment=my-app-7f7c78bd49-6-0 namespace=my-namespace template=stable
time="2021-01-12T21:05:53Z" level=info msg="New status: Current: 0, Updated: 0, Ready: 0, Available: 0" experiment=my-app-7f7c78bd49-6-0 namespace=my-namespace template=stable
time="2021-01-12T21:05:53Z" level=info msg="Reconciling template" experiment=my-app-7f7c78bd49-6-0 namespace=my-namespace template=canary
time="2021-01-12T21:05:53Z" level=info msg="Template progressed" experiment=my-app-7f7c78bd49-6-0 namespace=my-namespace template=canary
time="2021-01-12T21:05:53Z" level=info msg="Prev status: Current: 1, Updated: 1, Ready: 1, Available: 1" experiment=my-app-7f7c78bd49-6-0 namespace=my-namespace template=canary
time="2021-01-12T21:05:53Z" level=info msg="New status: Current: 0, Updated: 0, Ready: 0, Available: 0" experiment=my-app-7f7c78bd49-6-0 namespace=my-namespace template=canary
time="2021-01-12T21:05:53Z" level=info msg="Reconciling analysis" analysis=error-rate-analysis experiment=my-app-7f7c78bd49-6-0 namespace=my-namespace
time="2021-01-12T21:05:53Z" level=info msg="Enqueueing parent of my-namespace/my-app-7f7c78bd49-6-0: Rollout my-namespace/my-app"
time="2021-01-12T21:05:53Z" level=info msg="Started syncing rollout at (2021-01-12 21:05:53.157146341 +0000 UTC m=+1523.751334021)" namespace=my-namespace rollout=my-app
time="2021-01-12T21:05:53Z" level=info msg="Enqueueing parent of my-namespace/my-app-7f7c78bd49: Rollout my-namespace/my-app"
time="2021-01-12T21:05:53Z" level=info msg="Event(v1.ObjectReference{Kind:\"AnalysisRun\", Namespace:\"my-namespace\", Name:\"my-app-7f7c78bd49-6-0-error-rate-analysis\", UID:\"7a7ad7e7-c73e-4616-b692-13360bc5c492\", APIVersion:\"argoproj.io/v1alpha1\", ResourceVersion:\"441228962\", FieldPath:\"\"}): type: 'Warning' reason: 'Failed' metric 'error-rate' completed Failed"
time="2021-01-12T21:05:53Z" level=info msg="Event(v1.ObjectReference{Kind:\"AnalysisRun\", Namespace:\"my-namespace\", Name:\"my-app-7f7c78bd49-6-0-error-rate-analysis\", UID:\"7a7ad7e7-c73e-4616-b692-13360bc5c492\", APIVersion:\"argoproj.io/v1alpha1\", ResourceVersion:\"441228962\", FieldPath:\"\"}): type: 'Warning' reason: 'Failed' analysis completed Failed"
time="2021-01-12T21:05:53Z" level=info msg="Reconciling StableRS" namespace=my-namespace rollout=my-app
time="2021-01-12T21:05:53Z" level=info msg="Reconciling new ReplicaSet 'my-app-7f7c78bd49'" namespace=my-namespace rollout=my-app
time="2021-01-12T21:05:53Z" level=info msg="Reconciling old replica sets" namespace=my-namespace rollout=my-app
time="2021-01-12T21:05:53Z" level=info msg="No Steps remain in the canary steps" namespace=my-namespace rollout=my-app
time="2021-01-12T21:05:53Z" level=info msg="Rollout has executed every step" namespace=my-namespace rollout=my-app
time="2021-01-12T21:05:53Z" level=info msg="Timed out (false) [last progress check: 2021-01-12 21:05:53 +0000 UTC - now: 2021-01-12 21:05:53.174274991 +0000 UTC m=+1523.768462665]" namespace=my-namespace rollout=my-app
time="2021-01-12T21:05:53Z" level=info msg="No status changes. Skipping patch" generation=12 namespace=my-namespace resourceVersion=441229093 rollout=my-app
time="2021-01-12T21:05:53Z" level=info msg="Queueing up rollout for a progress after 599s" namespace=my-namespace rollout=my-app
time="2021-01-12T21:05:53Z" level=info msg="Reconciliation completed" namespace=my-namespace rollout=my-app time_ms=17.317052999999998

# Logs for a specific rollout:
time="2021-01-12T21:05:53Z" level=info msg="Started syncing rollout at (2021-01-12 21:05:53.032834185 +0000 UTC m=+1523.627021859)" namespace=my-namespace rollout=my-app
time="2021-01-12T21:05:53Z" level=info msg="Reconciling experiment step (stepIndex: 0)" namespace=my-namespace rollout=my-app
time="2021-01-12T21:05:53Z" level=info msg="Reconciling StableRS" namespace=my-namespace rollout=my-app
time="2021-01-12T21:05:53Z" level=info msg="Reconciling new ReplicaSet 'my-app-7f7c78bd49'" namespace=my-namespace rollout=my-app
time="2021-01-12T21:05:53Z" level=info msg="Reconciling old replica sets" namespace=my-namespace rollout=my-app
time="2021-01-12T21:05:53Z" level=info msg="Incrementing the Current Step Index to 1" namespace=my-namespace rollout=my-app
time="2021-01-12T21:05:53Z" level=info msg="Patched: {\"status\":{\"canary\":{\"currentExperiment\":null},\"conditions\":[{\"lastTransitionTime\":\"2021-01-08T06:34:29Z\",\"lastUpdateTime\":\"2021-01-08T06:34:29Z\",\"message\":\"Rollout has minimum availability\",\"reason\":\"AvailableReason\",\"status\":\"True\",\"type\":\"Available\"},{\"lastTransitionTime\":\"2021-01-08T06:24:58Z\",\"lastUpdateTime\":\"2021-01-12T21:05:53Z\",\"message\":\"ReplicaSet \\\"my-app-7f7c78bd49\\\" is progressing.\",\"reason\":\"ReplicaSetUpdated\",\"status\":\"True\",\"type\":\"Progressing\"}],\"currentStepIndex\":1}}" generation=12 namespace=my-namespace resourceVersion=441228021 rollout=my-app
time="2021-01-12T21:05:53Z" level=info msg="Reconciliation completed" namespace=my-namespace rollout=my-app time_ms=59.115897000000004
time="2021-01-12T21:05:53Z" level=info msg="Started syncing rollout at (2021-01-12 21:05:53.100947956 +0000 UTC m=+1523.695135693)" namespace=my-namespace rollout=my-app
time="2021-01-12T21:05:53Z" level=info msg="Reconciling StableRS" namespace=my-namespace rollout=my-app
time="2021-01-12T21:05:53Z" level=info msg="Reconciling new ReplicaSet 'my-app-7f7c78bd49'" namespace=my-namespace rollout=my-app
time="2021-01-12T21:05:53Z" level=info msg="Not finished reconciling new ReplicaSet 'my-app-7f7c78bd49'" namespace=my-namespace rollout=my-app
time="2021-01-12T21:05:53Z" level=info msg="Not finished reconciling ReplicaSets" namespace=my-namespace rollout=my-app
time="2021-01-12T21:05:53Z" level=info msg="Rollout has executed every step" namespace=my-namespace rollout=my-app
time="2021-01-12T21:05:53Z" level=info msg="Timed out (false) [last progress check: 2021-01-12 21:05:53 +0000 UTC - now: 2021-01-12 21:05:53.119314595 +0000 UTC m=+1523.713502275]" namespace=my-namespace rollout=my-app
time="2021-01-12T21:05:53Z" level=info msg="No status changes. Skipping patch" generation=12 namespace=my-namespace resourceVersion=441229093 rollout=my-app
time="2021-01-12T21:05:53Z" level=info msg="Queueing up rollout for a progress after 599s" namespace=my-namespace rollout=my-app
time="2021-01-12T21:05:53Z" level=info msg="Reconciliation completed" namespace=my-namespace rollout=my-app time_ms=18.510751
time="2021-01-12T21:05:53Z" level=info msg="Started syncing rollout at (2021-01-12 21:05:53.119477767 +0000 UTC m=+1523.713665442)" namespace=my-namespace rollout=my-app
time="2021-01-12T21:05:53Z" level=info msg="Reconciling StableRS" namespace=my-namespace rollout=my-app
time="2021-01-12T21:05:53Z" level=info msg="Reconciling new ReplicaSet 'my-app-7f7c78bd49'" namespace=my-namespace rollout=my-app
time="2021-01-12T21:05:53Z" level=info msg="Reconciliation completed" namespace=my-namespace rollout=my-app time_ms=10.930302
time="2021-01-12T21:05:53Z" level=error msg="rollout syncHandler error: Operation cannot be fulfilled on replicasets.apps \"my-app-7f7c78bd49\": the object has been modified; please apply your changes to the latest version and try again" namespace=my-namespace rollout=my-app
time="2021-01-12T21:05:53Z" level=info msg="Started syncing rollout at (2021-01-12 21:05:53.130615777 +0000 UTC m=+1523.724803514)" namespace=my-namespace rollout=my-app
time="2021-01-12T21:05:53Z" level=info msg="Reconciling StableRS" namespace=my-namespace rollout=my-app
time="2021-01-12T21:05:53Z" level=info msg="Reconciling new ReplicaSet 'my-app-7f7c78bd49'" namespace=my-namespace rollout=my-app
time="2021-01-12T21:05:53Z" level=info msg="Reconciling old replica sets" namespace=my-namespace rollout=my-app
time="2021-01-12T21:05:53Z" level=info msg="No Steps remain in the canary steps" namespace=my-namespace rollout=my-app
time="2021-01-12T21:05:53Z" level=info msg="Rollout has executed every step" namespace=my-namespace rollout=my-app
time="2021-01-12T21:05:53Z" level=info msg="Timed out (false) [last progress check: 2021-01-12 21:05:53 +0000 UTC - now: 2021-01-12 21:05:53.143699854 +0000 UTC m=+1523.737887529]" namespace=my-namespace rollout=my-app
time="2021-01-12T21:05:53Z" level=info msg="No status changes. Skipping patch" generation=12 namespace=my-namespace resourceVersion=441229093 rollout=my-app
time="2021-01-12T21:05:53Z" level=info msg="Queueing up rollout for a progress after 599s" namespace=my-namespace rollout=my-app
time="2021-01-12T21:05:53Z" level=info msg="Reconciliation completed" namespace=my-namespace rollout=my-app time_ms=13.441047
time="2021-01-12T21:05:53Z" level=info msg="Started syncing rollout at (2021-01-12 21:05:53.157146341 +0000 UTC m=+1523.751334021)" namespace=my-namespace rollout=my-app
time="2021-01-12T21:05:53Z" level=info msg="Reconciling StableRS" namespace=my-namespace rollout=my-app
time="2021-01-12T21:05:53Z" level=info msg="Reconciling new ReplicaSet 'my-app-7f7c78bd49'" namespace=my-namespace rollout=my-app
time="2021-01-12T21:05:53Z" level=info msg="Reconciling old replica sets" namespace=my-namespace rollout=my-app
time="2021-01-12T21:05:53Z" level=info msg="No Steps remain in the canary steps" namespace=my-namespace rollout=my-app
time="2021-01-12T21:05:53Z" level=info msg="Rollout has executed every step" namespace=my-namespace rollout=my-app
time="2021-01-12T21:05:53Z" level=info msg="Timed out (false) [last progress check: 2021-01-12 21:05:53 +0000 UTC - now: 2021-01-12 21:05:53.174274991 +0000 UTC m=+1523.768462665]" namespace=my-namespace rollout=my-app
time="2021-01-12T21:05:53Z" level=info msg="No status changes. Skipping patch" generation=12 namespace=my-namespace resourceVersion=441229093 rollout=my-app
time="2021-01-12T21:05:53Z" level=info msg="Queueing up rollout for a progress after 599s" namespace=my-namespace rollout=my-app
time="2021-01-12T21:05:53Z" level=info msg="Reconciliation completed" namespace=my-namespace rollout=my-app time_ms=17.317052999999998

Config

Our AnalysisTemplate looks like this:

apiVersion: argoproj.io/v1alpha1
kind: AnalysisTemplate
metadata:
  name: error-rate-analysis
  namespace: my-namespace
spec:
  args:
  - name: service
    value: my-app
  - name: stable-version
  - name: canary-version
  metrics:
  - name: error-rate
    initialDelay: 50s
    interval: 10s
    successCondition: result >= 0
    failureCondition: result < 0
    failureLimit: 2
    consecutiveErrorLimit: 3
    provider:
      datadog:
        interval: 60s
        query: |
          sum:requests.error.count{service:{{args.service}},version:{{args.stable-version}}} - 
          sum:requests.error.count{service:{{args.service}},version:{{args.canary-version}}}

And our Rollout steps look like the following:

apiVersion: argoproj.io/v1alpha1
kind: Rollout
metadata:
  name: my-app
  namespace: my-namespace
spec:
  canary:
    steps:
      - experiment:
          duration: 110s
          templates:
          - name: stable
            specRef: stable
            metadata:
              labels:
                app: my-app
                role: experiment
                tags.datadoghq.com/version: <STABLE_VERSION_SENT_TO_DATADOG>
            selector:
              matchLabels:
                app: my-app
                role: experiment
          - name: canary
            specRef: canary
            metadata:
              labels:
                app: my-app
                role: experiment
                tags.datadoghq.com/version: <CANARY_VERSION_SENT_TO_DATADOG>
            selector:
              matchLabels:
                app: my-app
                role: experiment
          analyses:
          - templateName: error-rate-analysis
            name: error-rate-analysis
            args:
              - name: stable-version
                value: <STABLE_VERSION_SENT_TO_DATADOG>
              - name: canary-version
                value: <CANARY_VERSION_SENT_TO_DATADOG>
...

Message from the maintainers:

Impacted by this bug? Give it a 👍. We prioritize the issues with the most 👍.

github-actions[bot] commented 1 year ago

This issue is stale because it has been open 60 days with no activity.