Open petemounce opened 5 years ago
This is really interesting context @petemounce, thanks for this.
I wonder if this might be a good use-case for the new GraphQL console and saved queries? We expose a lot of this information programatically and GraphQL would be a good way to query it. Would be keen to talk through how to make that more usable for you.
At present the Agent UI starts to decline in usefulness at around the 100+ agent point, @ticky made some solid progress on the the last round of UX improvements on that page, she might have some ideas about next steps too.
That sounds possible but also quite low-level. I think I'm after something that allows me to visualise the characteristics of my agent fleet.
One thing that just occurred to me is that just as I want to find and stomp (ok, "improve") flaky tests, I want to find and stomp flaky agents (which, hopefully for me with isolated queues per agent image, means queue, not individual agents).
Making it easier to see the failure modes (which steps of which pipelines) grouped by agent and agent-tags would make that significantly easier.
Currently, https://buildkite.com/organizations/improbable/agents is what is available to manage agents.
I have currently low hundreds of agents, 8x to a node, single platform. I'm soon going to introduce Windows, 1x to a node, then macOS.
I tag agents with:
v-0bd35eaddf77f8e2-------1534253424
that is a watermark of the revision from source control the agent node image was built from - will have tens to hundreds of these)arbitrary software=version
- things like docker, node, etcI would find this page more valuable if