buildkite / feedback

Got feedback? Please let us know!
https://buildkite.com
25 stars 24 forks source link

Facility to extend job timeout at runtime #481

Open DazWorrall opened 5 years ago

DazWorrall commented 5 years ago

We have a pipeline with an intermittent problem where a process gets 'stuck', and the build is delayed until the timeout kicks it. The timeout is doing its job here, but we're trying to debug the cause of the freezing - occasionally we're able to find one that's broken, but we only have a couple of minutes to debug before the timeout hits us. A buildkite-agent timeout extend 5m command or similar would be very helpful in these circumstances.

keithpitt commented 5 years ago

@DazWorral 👋 this is good timing! I've actually been pushing a few sneaky features to the Agent API that'd allow for this sort of thing.

Are the jobs you want to extend part of a parallel group? Or stand alone?

DazWorrall commented 5 years ago

A parallel group 🙂

djrodgerspryor commented 2 years ago

@keithpitt did something like this ever make it's way in?

Our use case is that we automatically set timeouts on all build jobs based on historical time-taken by successful runs. This is great for killing frozen builds, but when a job begins legitimately taking longer due to a code change, it will consistently hit the timeout and fail until the timeout is manually raised or reset.

I'd love to have the configured timeouts adapt automatically in this case, by implementing something like exponential backoff. Basically if a job times-out, then I want to run the next one with double (or 10x) the timeout: if it succeeds then it will generate successful-job data and change the default timeout; if it fails, then it's probably actually broken and needs to be manually fixed. Having a way to extend the timeout at runtime would let us implement that backoff ourselves at the start of each job.