admiraltyio / admiralty

A system of Kubernetes controllers that intelligently schedules workloads across clusters.
https://admiralty.io
Apache License 2.0
683 stars 86 forks source link

fix: reflect pod failed creates on the chaperon status #227

Open marwanad opened 1 month ago

marwanad commented 1 month ago

This fixes a bug where pod chaperons in a target cluster can delay the scheduling loop if the pods fail to create and no status is set on the chaperon.

This changes the behavior to set PodScheduled condition with reason PodFailedCreate and checking for that in the proxy filter step. The pod creation gets requeued and retried. Upon success, the chaperon status will inherit the pod one as before.

adrienjt commented 2 weeks ago
  1. The e2e test tests the implementation, not that the delay issue is fixed.
  2. This breaks the invariant that so far the pod chaperon status was simply the candidate pod status. Using the phase and condition to store a candidate pod creation error feels hacky. Could you store the pod creation error as a chaperon annotation instead? Sorry to mention that only now.