buildkite / feedback

Got feedback? Please let us know!
https://buildkite.com
25 stars 24 forks source link

Retry initial read-steps-from-repository jobs #486

Closed djrodgerspryor closed 5 years ago

djrodgerspryor commented 5 years ago

Our pipelines mostly begin with a job which runs buildkite-agent pipeline upload and dynamically sets-up the rest of the pipeline. Sometimes that job fails (because of a network error, or the agent is unhealthy etc.) and then the whole pipeline fails. For steps within the pipeline we have configured automatic retry to fix these kinds of problems, but this first job is configured via the GUI, and there doesn't seem to be any way to configure retry there.

I'd love the option to add automatic retry to pipeline steps in the GUI so that we can be more certain that any pipeline failures are caused by bad code, rather than transient errors on a particular agent.

I'm sure it seems like a bit of an edge-case to retry simple upload-pipeline jobs, but we've run enough jobs now that this is a small but noticeable source of our false-negative pipeline failures.

See an example job (and the lack of retry options):

screen shot 2018-11-15 at 2 25 56 pm
keithpitt commented 5 years ago

@djrodgerspryor πŸ‘‹ we've just released a new feature in Beta that should help you out here:

https://forum.buildkite.community/t/defining-pipeline-build-steps-with-yaml/79

You should be able to switch to YAML Steps and define retry rules on your initial buildkite-agent pipeline upload command.

I'd love to hear what you think! Feel free to provide some feedback over in our community forum.

djrodgerspryor commented 5 years ago

@keithpitt I've just converted all of our pipelines to use YAML definitions and it worked perfectly πŸ˜„

Thanks for the quick response!

keithpitt commented 5 years ago

Yay! I’m glad to hear it’s all working πŸ‘ŒπŸ»