Arksine / moonraker

Web API Server for Klipper
https://moonraker.readthedocs.io
GNU General Public License v3.0
1.02k stars 392 forks source link

POC: Prometheus metrics module #816

Open kdomanski opened 4 months ago

kdomanski commented 4 months ago

This is a POC to address #504 . It is meant to spawn feedback, but is still neither complete nor satisfactory to me.

kdomanski commented 4 months ago

Screenshot 2024-02-19 095435 Screenshot 2024-02-19 095452

Arksine commented 4 months ago

Thanks, I have a few questions/comments based on your initial feedback.

No apikey/auth support

Is this a limitation in Prometheus? Does it support any type of authentication?

Experience in instrumenting software tells me, that it would be better to instrument printer objects directly in Klippy and then only add Moonraker-specific metrics from here. Need to think about this.

I'm not sure Kevin would be enthusiastic about merging instrumentation directly. Its effectively repackaging the data, sending instrumented objects to Moonraker would duplicate the serialization/deserialization effort, presumably with additional overhead. Part of the motivation for Klipper's API server is to farm this kind of work to applications outside of Klippy.

I'm not sure whether this is race-free. Could a status update callback land after klippy shutdown/disconnect callback?

Status updates can still be received after shutdown, but not after a disconnect.

kdomanski commented 4 months ago

No apikey/auth support

Is this a limitation in Prometheus? Does it support any type of authentication?

No, this is my mistake. I forgot that Moonraker api keys work simply as bearer tokens. That works just fine with Prom.

Experience in instrumenting software tells me, that it would be better to instrument printer objects directly in Klippy and then only add Moonraker-specific metrics from here. Need to think about this.

I'm not sure Kevin would be enthusiastic about merging instrumentation directly. Its effectively repackaging the data, sending instrumented objects to Moonraker would duplicate the serialization/deserialization effort, presumably with additional overhead. Part of the motivation for Klipper's API server is to farm this kind of work to applications outside of Klippy.

And this is exactly why I think we'll be pondering this for a while. I understand and share the desire to minimize any extraneous stuff in Klippy itself. But I just don't think this particular thing can be farmed out effectively.

In Klippy, there's access to the actual objects in question, type information, introspection, and the like. It should be possible to do most of the instrumentation from the Prometheus module itself (with a little bit of introspection) and not litter every other module manually with instrumentation code.

But doing it in Moonraker means inferring all that information which Klippy already has, based on an implicit API (status updates), which doesn't have a clear schema. Any change in Klippy must be followed by a change in Moonraker and a bespoke converter from status JSON to metrics for each component type. That's overhead too. And unlike status updates, which are strictly scheduled, the Prometheus metrics are only serialized during scrape. In the end, I'd rather not argue about performance and/or overhead without having actual measurements thereof.

I'm not sure whether this is race-free. Could a status update callback land after klippy shutdown/disconnect callback?

Status updates can still be received after shutdown, but not after a disconnect.

I failed to articulate my question clearly. What I mean is: if a status updates comes right before a disconnect, will their handlers still execute in order every time, or is there a risk that after clearing the metrics (on disconnect, on shutdown, etc.) a wayward update handler still runs and writes a value from the old session?