This fixes a bug where pod chaperons in a target cluster can delay the scheduling loop if the pods fail to create and no status is set on the chaperon.
This changes the behavior to set PodScheduled condition with reason PodFailedCreate and checking for that in the proxy filter step. The pod creation gets requeued and retried. Upon success, the chaperon status will inherit the pod one as before.
The e2e test tests the implementation, not that the delay issue is fixed.
This breaks the invariant that so far the pod chaperon status was simply the candidate pod status. Using the phase and condition to store a candidate pod creation error feels hacky. Could you store the pod creation error as a chaperon annotation instead? Sorry to mention that only now.
This fixes a bug where pod chaperons in a target cluster can delay the scheduling loop if the pods fail to create and no status is set on the chaperon.
This changes the behavior to set
PodScheduled
condition with reasonPodFailedCreate
and checking for that in the proxy filter step. The pod creation gets requeued and retried. Upon success, the chaperon status will inherit the pod one as before.