buildkite / feedback

Got feedback? Please let us know!
https://buildkite.com
25 stars 24 forks source link

Expose all agent's uptime on the Agents UI #233

Open jamesottaway opened 7 years ago

jamesottaway commented 7 years ago

From a technical perspective, querying each running agent when loading the list is a little odd, but maybe with some caching this could still be feasible.

The problem I'm trying to solve is to easily identify agents in our elastic cluster which I would've expected to shutdown due to lack of jobs overnight, but are still running the next morning.

Most of the time it's due to a build step hanging, but not always, hence the idea of narrow our investigation process down to the oldest agents.

keithpitt commented 7 years ago

@jamesottaway oh hai! Just to confirm, you'd like to see how long an agent has been connected for on the Agents List page itself?

jamesottaway commented 7 years ago

I was thinking the host's uptime, but knowing how long the agent has been connected would serve a similar purpose.

keithpitt commented 7 years ago

Ohhh...uptime, right right. I was thinking agent uptime instead of the actual servers uptime! It's an interesting idea! Probably starts creeping into territory that the agent should stay out of (server monitoring).

The problem I'm trying to solve is to easily identify agents in our elastic cluster which I would've expected to shutdown due to lack of jobs overnight, but are still running the next morning.

Perhaps there's something in CloudWatch we could offer instead? /cc @lox @toolmantim

petemounce commented 6 years ago

I'd be interested to see also how many seconds an agent has been active, vs idle. This would allow me to make finer-grained capacity decisions.