buildkite / feedback

Got feedback? Please let us know!
https://buildkite.com
25 stars 24 forks source link

Managing a large agent estate #447

Open petemounce opened 5 years ago

petemounce commented 5 years ago

Currently, https://buildkite.com/organizations/improbable/agents is what is available to manage agents.

I have currently low hundreds of agents, 8x to a node, single platform. I'm soon going to introduce Windows, 1x to a node, then macOS.

I tag agents with:

I would find this page more valuable if

lox commented 5 years ago

This is really interesting context @petemounce, thanks for this.

I wonder if this might be a good use-case for the new GraphQL console and saved queries? We expose a lot of this information programatically and GraphQL would be a good way to query it. Would be keen to talk through how to make that more usable for you.

At present the Agent UI starts to decline in usefulness at around the 100+ agent point, @ticky made some solid progress on the the last round of UX improvements on that page, she might have some ideas about next steps too.

petemounce commented 5 years ago

That sounds possible but also quite low-level. I think I'm after something that allows me to visualise the characteristics of my agent fleet.

petemounce commented 5 years ago

One thing that just occurred to me is that just as I want to find and stomp (ok, "improve") flaky tests, I want to find and stomp flaky agents (which, hopefully for me with isolated queues per agent image, means queue, not individual agents).

Making it easier to see the failure modes (which steps of which pipelines) grouped by agent and agent-tags would make that significantly easier.