This PR checks the conditions within the EDS status field for the presence of the "Canary-Failed" condition, and compares its timestamp to the timestamp of the active replicaset to determine if the EDS canary failed or not
Motivation
An agent version was deployed to staging that caused the canary to automatically fail due to a high number of restarts, but the check did not look at the canary status, so when desired replicas matched running replicas, the check succeeded, and we were none the wiser.
Additional Notes
While testing the behavior, I noticed that the EDS object does not denote it has failed in any other way other than the conditions field. Here is the status field of an EDS object which had just failed:
Based on this example, I built the check to look at the conditions slice for the Canary-Failed type
Describe your test plan
I have already deployed this change to a staging cluster, and have manually failed a deployment and was able to verify that the check failed with: Error: active canary has a creation timestamp before the last CanaryFailed condition, meaning the deployment failed
What does this PR do?
This PR checks the conditions within the EDS status field for the presence of the "Canary-Failed" condition, and compares its timestamp to the timestamp of the active replicaset to determine if the EDS canary failed or not
Motivation
An agent version was deployed to staging that caused the canary to automatically fail due to a high number of restarts, but the check did not look at the canary status, so when desired replicas matched running replicas, the check succeeded, and we were none the wiser.
Additional Notes
While testing the behavior, I noticed that the EDS object does not denote it has failed in any other way other than the conditions field. Here is the status field of an EDS object which had just failed:
Based on this example, I built the check to look at the conditions slice for the Canary-Failed type
Describe your test plan
I have already deployed this change to a staging cluster, and have manually failed a deployment and was able to verify that the check failed with:
Error: active canary has a creation timestamp before the last CanaryFailed condition, meaning the deployment failed
Before change
After change
The deployment values can be seen here