buildkite / feedback

Got feedback? Please let us know!
https://buildkite.com
25 stars 24 forks source link

RFC: Timeout for requeuing a job which has been assigned to an agent, but not picked up #396

Open amitsaha opened 6 years ago

amitsaha commented 6 years ago

Let's consider an agent has got into a weird local state - let's ignore the reasons for that for now (an example below). Since the weirdness is local, it has been assigned a job by the master because it is free. However, it doesn't actually accept the job and the job stays stuck in this state.

What would be nice is to have a timeout set (pipeline global) so that after that the master reassigns it to a different agent. A maximum number of reassignment would be another useful configuration or the global timeout could apply here which once expired will cancel the build entirely.

Example of weirdness

Thoughts?