Improve documentation for retrieving metrics from components with multiple connections

stefan-brus-frequenz commented 2 months ago

What's needed?

We need to refine our API's Protobuf message definitions to allow clients to request specific metrics from microgrid components, optionally filtering the data by multiple connections (such as different DC strings in an inverter). This involves updating field names from "sources" to "connections" for better clarity, ensuring that the relationships between metrics, connections, and components are accurately represented in the message structures and documentation.

Proposed solution

Example structure:

message MetricConnections {
       frequenz.api.common.v1.metrics.Metric metric = 1; //required
       repeated string connections = 2; //optional
}

Use cases

Allow limiting to request metrics by component connection e.g. by dedicated inverter DC PV string.

Alternatives and workarounds

No response

Additional context

This issue precedes the changes in this issue.

thomas-nicolai-frequenz commented 2 months ago

Here is my suggestion but it goes beyond just enhancing the documentation also renaming a few things but its should be none-breaking:

-    // Filter for sources for metrics, specified for a single metric at a time.
-    // Specifying a metric here will only stream data for that metric where data
-    // can be found tagged with the given source names.
+    // Filter criteria for selecting specific sources of metrics, specified for a single metric at a time.
+    //
+    // This message allows clients to request data from particular sources for a given metric.
+    // It's especially useful when a metric can originate from various connections of a component, such as 
+    // different dc strings for e.g. battery or PV connected to a hybrid inverter.
+    //
+    // !!! note
+    //     The `MetricSourceFilter` message enables fine-grained control over which sources of a metric
+    //     are included in the data stream. If multiple `MetricSourceFilter` are specified for different metrics,
+    //     the filters apply separately to each metric.
+    //     The `metric_sources` specified here correspond to the `source` field in the `MetricSample` message
+    //     defined in the common API. By filtering on `metric_sources`, you effectively select which `MetricSample`
+    //     messages to receive based on their `source` value.
+    //
+    // !!! example
+    //      Suppose you have a hybrid inverter that provides the `VOLTAGE` metric from both its battery and 
+    //      PV array. If you're only interested in the voltage measurements from the battery, you can use 
+    //      `MetricSourceOptions` to specify the `VOLTAGE` metric and set `sources` to
+    //      `["dc_battery_1"]` for the dc string connected to a battery.
+    //
+    //      ```
+    //       source_filter: {
+    //              "metric": "VOLTAGE",
+    //              "sources": ["dc_battery_1"]
+    //       }
+    //       ```
+    //
-     message MetricSourceOptions {
+     message MetricSourceFilter {
       // The metric for which sources will be filtered.
+      //
+      // Only data for this metric will be filtered based on the provided `sources`.
+      // Metrics not specified here will not be affected by this filter, and all of their
+      // sources will be present in the received data samples. This can also be used to
+      // identify which sources are available for a given metric.
       frequenz.api.common.v1.metrics.Metric metric = 1;

       // A list of sources to allow for the specified metric.
-      // If this list is empty, then no data for this metric is streamed.
+      //
+      // Each source corresponds to the `source` field in `MetricSample` messages (see the common API definition).
+      // If this list is empty, then no data for this metric will be streamed.
+      //
+      // !!! important
+      //     The source identifiers must match exactly those used in the `MetricSample` messages.
+      //     Ensure that the sources specified here are valid for the given metric and component.
+      //
+      // !!! example
+      //      For a metric that can come from multiple DC strings in a PV array, you might specify:
+      //
+      //      ```
+      //          sources: ["dc_string_1", "dc_string_2"]
+      //      ```
+      //
-      repeated string metric_sources = 2;
+      repeated string sources = 2;
     }

thomas-nicolai-frequenz commented 2 months ago

Here is an updated version of the diff above reflecting on the changes introduced in this issue:

-    // Filter for sources for metrics, specified for a single metric at a time.
-    // Specifying a metric here will only stream data for that metric where data
-    // can be found tagged with the given source names.
+    // Message defining a metric to receive data for, optionally filtering by connections.
+    //
+    // This message allows clients to request data for a specific metric, and optionally
+    // filter the data by connection identifiers. This is particularly useful when a metric can
+    // originate from multiple connections within a component, such as different DC strings for
+    // batteries or PV arrays connected to a hybrid inverter.
+    //
+    // !!! note
+    //     The `MetricConnections` message enables fine-grained control over which are included in the data stream. 
+    //     If multiple `MetricConnections` messages are specified for different metrics, the filters apply 
+    //     separately to each metric.
+    //     The `connections` specified here correspond to the `connections` field in the `MetricSample` message
+    //     defined in the common API. By filtering on `connections`, you effectively select which `MetricSample`
+    //     messages to receive based on their `connections` value.
+    //
+    // !!! example
+    //     Suppose you have a hybrid inverter that provides the `VOLTAGE` metric from both its battery and 
+    //     PV array. If you're only interested in the voltage measurements from the battery, you can use 
+    //     `Metrics` message to specify the `VOLTAGE` metric and set `connections` to `["dc_battery_1"]` 
+    //     for the DC string connected to a battery.
+    //
+    //     ```json
+    //     {
+    //       "metric": "VOLTAGE",
+    //       "connections": ["dc_battery_1"]
+    //     }
+    //     ```
+    //
-     message MetricSourceOptions {
+    message MetricConnections {
       // The metric for which data is requested.
+      //
+      // If `connections` are specified, only data for this metric from those connections will be returned.
+      // If no `connections` are specified, data from all connections for this metric will be returned.
       frequenz.api.common.v1.metrics.Metric metric = 1;

       // An optional list of connections to filter the data for the specified metric.
-      // If this list is empty, then no data for this metric is streamed.
+      //
+      // Each connection corresponds to the `connections` field in `MetricSample` messages 
+      // (see the common API definition).
+      // If this list is empty or not specified, data from all sources for this metric will be returned.
+      //
+      // !!! important
+      //     The connections identifiers must match exactly those used in the `MetricSample` messages.
+      //     Ensure that the connections specified here are valid for the given metric and component.
+      //
+      // !!! example
+      //     For a metric that can come from multiple DC strings in a PV array, you might specify:
+      //
+      //     ```json
+      //     {
+      //       "connections": ["dc_string_1", "dc_string_2"]
+      //     }
+      //     ```
+      //
       repeated string connections = 2;
     }

stefan-brus-frequenz commented 2 months ago

I would propose to encapsulate the sources inside its own message, so that we can treat No metric sources given (message is not present) and Don't return any metric sources (message is present, but source list is empty) as separate use cases, as we discussed and in #55. Other than that the changes look good to me!

thomas-nicolai-frequenz commented 2 months ago

OK, so the structure would look like this?

message Metrics {
  message MetricSources {
     repeated string sources = 1;
  }
  Metric metric = 1;
  MetricSources sources = 2;
}

repeated Metrics metrics = N; // N is the appropriate field number in context

thomas-nicolai-frequenz commented 1 month ago

Alright, as discussed we will go with repeated string sources as laid out above and do not wrap it in an additional message as we do not need to distinguish between:

Non-set message (all sources)
~~Set message with empty list (no sources)~~
Set message with non-empty list (only the given sources)

The second option is not needed.

thomas-nicolai-frequenz commented 1 month ago

I've updated the diff above with all the changes discussed.

frequenz-floss / frequenz-api-reporting