eclipse / kuksa.val

kuksa.val
Apache License 2.0
95 stars 51 forks source link

Databroker Prometheus endpoint #473

Open mikehaller opened 1 year ago

mikehaller commented 1 year ago

To visualize the VSS data, it would be nice to have an integration with Prometheus.

Since databroker already is a gRPC server, it may be easy to add this feature, as a Prometheus exporter is just an HTTP endpoint ("/metrics") returning a flat multi-line text response with metrics.

The VSS tree could be flattened and provided as Prometheus Metrics, together with the current vehicle signal value and its last-updated timestamp:

# Format: <metric as vss path in prometheus notation[labels as {key=value}]> <current value> <timestamp in nanoseconds>
vehicle_cabin_comfort_desiredtemperatur 21 1675933158000
vehicle_cabin_hvac_station_row2_right_fanspeed 60 1675933158000
vehicle_powertrain_tractionbattery_charging_chargevoltage.phase1 400 1675933158000
vehicle_cabin_hvac_station_row2_left_fanspeed{unit="percent", min="0", "max="100", type="actuator"} 35 1675933158000
vehicle_chassis_axle_row1_wheel_left_speed{unit="km/h", uuid="47897f20b2745b6aa2d0f76f1ecf824a", datatype="float"} 80 1675933158000
...

Notes / Open Questions

SebastianSchildt commented 1 year ago

Hi @mikehaller , sounds quite "doable", I'd assume that "should be" like a 10 min job, using the python lib.

(not sure about getting all the metadata details, but again, that could be hjacked via config first, and would just indicate a point, where need to extend/broaden the GRPC/Py-lib Api path)

@eriksven @sophokles73 @LuLeRoemer I have in the back of my mind you have played with it? Or do I mix it up with OpenTelemetry? is that something different? Please help me out here cloud natives :)

Another option might be using the existing hono-influxdb connector, and then trying feeding Prometheus from influx, but I am not sure it is easier: You suddenly need Hono in backend, and you would still need to write a "pyapi-to-hono-mqtt" adapter, that is likely not any easier than just serving /metrics yourself. But maybe @eriksven has an opinion here

sophokles73 commented 1 year ago

@SebastianSchildt You might have misheard or misinterpreted something here. There is no such thing as a hono-influxdb connector. Eclipse Hono in fact reports its metrics to an OpenTelemetry collector which can then be used to forward the metrics to other components. It also provides an endpoint for scraping by Prometheus. That said, we have been experimenting with doing something similar for values in the Databroker. For that purpose we created a client that polls the Databroker for a set of (current) values periodically and then uses the OpenTelemetry API within the client to report the values. However, when the metrics get exported to the OpenTelemetry Collector, they are getting aggregated already, e.g. being turned into average values over the last polling interval. So that didn't work out for us. We now explicitly POST the polled values to an InfluxDB in a set of measurements.

IMHO we should be careful to not confuse data export with reporting metrics here. For the former, exposing an HTTP endpoint that dumps all data might be a valid option. However, I guess this might become more inefficient with the number of VSS paths increasing. For the latter, I would suggest to use the OpenTelemetry SDK for instrumenting the Databroker code to keep track of some meaningful metrics (e.g. number of requests/s etc) and then also use the flexibility of the OpenTelemetry Collector to make these metrics available to whoever is interested.

SebastianSchildt commented 1 year ago

Thank you for the clarification, understood (I hope) So I think what Mike is talking about then is data export , so that is completely unrelated to any metrics/telemetry from databroker.

Wrt to Hono-Influx connector I was referring to https://github.com/eclipse/kuksa.cloud/tree/master/utils/hono-influxdb-connector , that could be used to get VSS data out of Hono, but as I said above, likely similar to just implement a promtheus-compatible /metrics http endpoint.

SebastianSchildt commented 1 year ago

Some example code to get you started @mikehaller No metadata yet, and since I do not know how to deal prometheus side with values that don't exist (currently I use "nan"), you might need to change that, to leave those metrics out. Anyway, it is only two files

requirements.txt

web.py
kuksa-client

kuksa-prometheus.py

import web
from kuksa_client.grpc import VSSClient
from datetime import datetime

vsspaths = [ "Vehicle.Speed",
             "Vehicle.Body.Trunk.Rear.IsOpen",
             "Vehicle.Cabin.HVAC.Station.Row2.Left.FanSpeed",
             "Vehicle.Cabin.HVAC.Station.Row2.Right.FanSpeed",
             "Vehicle.Chassis.Axle.Row1.Wheel.Left.Speed",
             "Vehicle.Powertrain.TractionBattery.Charging.ChargeVoltage.Phase1",
           ]

urls = (
    '/', 'mainpage',
    '/metrics', 'metrics'
)
app = web.application(urls, globals())

class mainpage:
    def GET(self):
        return """<html>
                    <head><title>KUKSA metrics</title></head>
                    <body><h1>KUKSA metrics</h1><p>Find awesome metrics <a href="./metrics">here</a>.
                </html>"""

class metrics:
    def GET(self):
        prometheus_out=""
        with VSSClient('127.0.0.1', 55556) as client:
            current_values = client.get_current_values(vsspaths)     
            for path in vsspaths:
                convertedpath=str(path).lower().replace(".","_")
                value="nan"
                timestamp=0
                if current_values[path] is not None:
                    value=current_values[path].value
                    timestamp=int(datetime.timestamp(current_values[path].timestamp)*1e9)
                prometheus_out+=f"{convertedpath} {value} {timestamp}\n"
        print(current_values)
        return prometheus_out

if __name__ == "__main__":
    app.run()

Example output (I only provided Speed)

vehicle_speed 0.0 1676037042136367104
vehicle_body_trunk_rear_isopen nan 0
vehicle_cabin_hvac_station_row2_left_fanspeed nan 0
vehicle_cabin_hvac_station_row2_right_fanspeed nan 0
vehicle_chassis_axle_row1_wheel_left_speed nan 0
vehicle_powertrain_tractionbattery_charging_chargevoltage_phase1 nan 0

Feel free to play. If you intend to make it a somewhat more useful example, it might fit well in https://github.com/eclipse/kuksa.val/tree/master/kuksa_apps 🔧PRs always welcome 😁

mikehaller commented 1 year ago

indeed, i was misusing the term metrics here. what @sophokles73 is saying that Databroker might expose its own metrics like how many datapoints are registered how many clients are connected how many subscriptions are active etc

i was more looking for a data export and "accidentally" misusing prometheus format here, so i can scrape them easily into grafana.

two separate things i guess.

i will have a look at implementing it as a separate client, although that means i need to subscribe to all datapoints and keep them in memory. was hoping for an implementation inside of databroker to minimize that overhead.

argerus commented 1 year ago

separate client, although that means i need to subscribe to all datapoints and keep them in memory.

I don't think that's necessary.

If I understand correctly, Prometheus would periodically ask the endpoint: "give me all your metrics". Or does it only want changes?

If Prometheus doesn't ask for this to frequently (i.e. multiple times per second) I think it would be perfectly reasonable for the client to fetch everything from databroker on every request. This isn't a lot of data, and I don't think the latency would be much of an issue either.

If implementing an endpoint for "actual" metrics in databroker is in the cards (i.e. exporting requests / second counters or number of subscribers etc), it would probably make sense to also add the values of datapoints as another possible metric to export. I don't know if this is considered a misuse or not..

sophokles73 commented 1 year ago

If I understand correctly, Prometheus would periodically ask the endpoint: "give me all your metrics". Or does it only want changes?

It takes whatever it gets :-) If the service that is being scraped by Prometheus does return values for only certain metrics, then that is what Prometheus will record in its time series DB (and interpolate missing values).

If Prometheus doesn't ask for this to frequently (i.e. multiple times per second) I think it would be perfectly reasonable for the client to fetch everything from databroker on every request. This isn't a lot of data, and I don't think the latency would be much of an issue either.

That is configurable but most Prometheus setups will scrape the targets every 10-15 seconds only. While I tend to agree that (currently) we are not talking about much data, that might change with the number of data entries. In general, though, having a way to export all current data seems feasible, I would simply try to not get it confused with the Databroker's (real) metrics of operation. That said, you might also wonder if the Prometheus metrics data format is appropriate for this data export functionality or if a more general format wouldn't be a better fit. You will probably not tie this functionality to the assumption that the client is always a Prometheus server, right?