Update container start error so it respects `MaxSlowStartDuration`

DataDog / extendeddaemonset

Kubernetes Extended Daemonset controller

Apache License 2.0

98 stars 13 forks source link

What does this PR do?

During canary, if an error occurs during pod creation within the MaxSlowStartDuration, the pod is no longer auto-paused. If MaxSlowStartDuration is not defined, the behavior is unchanged.

Motivation

Many recent deployments have been paused due to a CreateContainerConfigError - Error: failed to sync secret cache: timed out waiting for the condition. Though this error is sometimes rectified a few seconds/minutes after it is first raised, the deployment remains paused, thus slowing down releases, etc. This change is being introduced in hopes of avoiding cases where canary is paused although the issue no longer exists.

Additional Notes

Should a default value for maxSlowStartDuration be set for our clusters?

Describe your test plan

Unit tests and E2E test were included with the change. I also deployed these changes to a few staging clusters and did not see the issue of a paused canary arising in those clusters.

Codecov Report

Merging #169 (f616e2b) into main (8a74725) will increase coverage by 0.04%. The diff coverage is 100.00%.

@@            Coverage Diff             @@
##             main     #169      +/-   ##
==========================================
+ Coverage   63.05%   63.10%   +0.04%     
==========================================
  Files          41       41              
  Lines        3094     3098       +4     
==========================================
+ Hits         1951     1955       +4     
  Misses       1023     1023              
  Partials      120      120

Flag	Coverage Δ
unittests	`63.10% <100.00%> (+0.04%)`	:arrow_up:

Flags with carried forward coverage won't be shown. Click here to find out more.

Files	Coverage Δ
...ers/extendeddaemonsetreplicaset/strategy/canary.go	`92.64% <100.00%> (+0.10%)`	:arrow_up:

Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 8a74725...f616e2b. Read the comment docs.

DataDog / extendeddaemonset