coreos / zincati

Agent for Fedora CoreOS auto-updates
https://coreos.github.io/zincati
Apache License 2.0
155 stars 33 forks source link

Expose metrics over tcp #677

Open pratikmallya opened 2 years ago

pratikmallya commented 2 years ago

Feature Request

Currently metrics are exposed over unix sockets, and its recommended to use local_exporter. However, there are 2 main issues:

Desired Feature

its possible to read the metrics over a configurable tcp endpoint (host:port)

Example Usage

Other Information

lucab commented 2 years ago

Thanks for the report. No, the metrics are not exposed over a TCP endpoint on purpose, as Zincati runs in the host network namespace and overall at this time it should not contain a full-blown HTTP server. The intended usage is to bridge the metrics content from local endpoint to whatever is your monitoring solution. local_exporter tries to tackle the usual case of "Prometheus through an overlay network", but there could be others. Notably, you still need some components handling network-policing/authentication/authorization in any case.

You are absolutely right that local_exporter repo is missing licensing details. If that works for you, I'd attach an Apache-2.0 notice to it. Sounds good?

As to operational overhead, it gets lumped into the rest of maintenance costs of the observability stack. There are many prior examples here, like postgres_exporter and similar, so this isn't inventing a new category for a single component. If local_exporter instability/failures are a concern, I do recommend designing a more reliable bridging solution tailored to your environment.

If it makes things easier, we can consider exposing the same textual metrics over another existing local transport, i.e. through a DBus method.

pratikmallya commented 2 years ago

Hey @lucab, thanks for the quick response!

You are absolutely right that local_exporter repo is missing licensing details. If that works for you, I'd attach an Apache-2.0 notice to it. Sounds good?

Yes, that would certainly help, thank you!

As to operational overhead, it gets lumped into the rest of maintenance costs of the observability stack. There are many prior examples here, like postgres_exporter and similar, so this isn't inventing a new category for a single component.

I understand that this may be a decision that's already been decided on, but I do want to push back on this framing a little bit . I would argue that zincati is more alike kubelet rather than postgres; its a daemon service that is primarily concerned with the node itself, rather than a specific application. Specifically, scale wise, zincati needs to be deployed on every coreos VM/node that an infra team manages; what this means is we're easily looking at 100s to 1000s of instances of zincati that need to be managed (by the median infra team; I'm quoting this number as a rough guess from what I've seen). The overhead of yet another service just to collect metrics from zincati would need to be on the same order.

I understand that this is not a perfect comparison since kubelets are http servers too, adding a metrics endpoint is just another handler rather than adding an http handler just for serving metrics. But it also seems like rust does offer libraries to enable this without too much overhead.

In any case, my vote is to enable metrics over http to make life easier for infra teams that have to manage zincati. I would be happy to make a contribution to enable this feature if the project owners are not against it.