Let's consider an agent has got into a weird local state - let's ignore the reasons for that for now (an example below). Since the weirdness is local, it has been assigned a job by the master because it is free. However, it doesn't actually accept the job and the job stays stuck in this state.
What would be nice is to have a timeout set (pipeline global) so that after that the master reassigns it to a different agent. A maximum number of reassignment would be another useful configuration or the global timeout could apply here which once expired will cancel the build entirely.
Example of weirdness
Cancel a running job which is stuck from the Web UI
The agent's logs show it's canceling this job
Just stays in that state until the agent is stopped from the Web UI, and then restarted.
Let's consider an
agent
has got into a weird local state - let's ignore the reasons for that for now (an example below). Since the weirdness is local, it has been assigned a job by the master because it is free. However, it doesn't actually accept the job and the job stays stuck in this state.What would be nice is to have a timeout set (pipeline global) so that after that the master reassigns it to a different agent. A maximum number of reassignment would be another useful configuration or the global timeout could apply here which once expired will cancel the build entirely.
Example of weirdness
Thoughts?