Expose 2 new metrics about worker health

cirruslabs / orchard

Orchestrator for running Tart Virtual Machines on a cluster of Apple Silicon devices

Other

197 stars 17 forks source link

Expose 2 new metrics about worker health #203

Closed mcmarkj closed 1 month ago

mcmarkj commented 1 month ago

This exposes 2 new metrics around worker health:

# HELP orchard_worker_last_seen
# TYPE orchard_worker_last_seen gauge
orchard_worker_last_seen{worker_name="Marks-Worker"} 1.725530969e+09

and

# HELP orchard_worker_status
# TYPE orchard_worker_status gauge
orchard_worker_status{status="online",worker_name="Marks-Worker"} 1

That way we can explicitly alert if a node goes offline.

CLAassistant commented 1 month ago

All committers have signed the CLA.

mcmarkj commented 1 month ago

While I'm here, should orchard_vms be constantly incrementing on each scrape? Would it make sense to change that from this to something that just sets it to a count?

    for _, vm := range vms {
        vmsStat.With(map[string]string{"status": string(vm.Status)}).Inc()
    }