Decide on data structure for Reporting client

flora-hofmann-frequenz commented 2 months ago

What's needed?

A data structure that can expose the metrics as well as the states per microgrid and component.

Proposed solution

Option to be discussed: 1) Full (artificial) mapping from protobuf to Python datatype 2) Helpers (such as namedtuples) to work

Use cases

Not only metrics, but also states, warnings and errors should be exposed.

Alternatives and workarounds

Any alternative ideas are welcome.

Additional context

As discussed in API usergroup meeting on 6th May 2024: Currently, the ReportingApiClient only exposes MetricSamples using a namedtuple structure.

To additionally expose States:

At any given timestamp, a component can have multiple states, such as charging and relay_open
Practically, whenever you get a metric you can also get the states information

The example output structure from the protobuf is as follows with the entry point to the GRPC returning those messages.

> // Response containing historical microgrid component metrics in one or multiple microgrids
> //
> // Each microgrid's components are provided as timeseries data structures that encapsulate
> // metrics, bounds, errors and operational state and their associated timestamps for each component
> // within the specified time range.
> //
> // !!! example
> //     Example output structure:
> //     ```
> //     microgrids: [
> //       {
> //         microgrid_id: 1,
> //         components: [
> //           {
> //             component_id: 13,
> //             metric_samples: [
> //               /* list of metrics for multiple timestamps */
> //               { sampled_at: "2023-10-01T00:00:00Z", metric: "DC_VOLTAGE_V", sample: {...}, bounds: {...} },
> //               { sampled_at: "2023-10-01T00:00:00Z", metric: "DC_CURRENT_A", sample: {...}, bounds: {...} }
> //               { sampled_at: "2023-10-01T00:05:00Z", metric: "DC_VOLTAGE_V", sample: {...}, bounds: {...} },
> //               { sampled_at: "2023-10-01T00:05:00Z", metric: "DC_CURRENT_A", sample: {...}, bounds: {...} }
> //             ],
> //             states: [
> //               /* list of states for multiple timestamps */
> //               { sampled_at: "2023-10-01T00:00:13.12Z", states: [...], errors: [...], warnings: [...] },
> //               { sampled_at: "2023-10-01T00:02:22.01Z", states: [...], errors: [...], warnings: [...] },
> //               { sampled_at: "2023-10-01T00:05:02.32Z", states: [...], errors: [...], warnings: [...] },
> //             ]
> //           },
> //           {
> //             component_id: 243,
> //             metric_samples: [ ... ],
> //             states: [ ... ]
> //           },
> //         ]
> //       },
> //       {
> //         microgrid_id: 2,
> //         components: [ ... ]
> //       }
> //     ]
> //     ```

To retrieve MetricSample information, we use the following helper in the ReportingApiClient:

MetricSample = namedtuple(
    "MetricSample", ["timestamp", "microgrid_id", "component_id", "metric", "value"]
)
"""Type for a sample of a time series incl. metric type, microgrid and component ID

A named tuple was chosen to allow safe access to the fields while keeping the
simplicity of a tuple. This data type can be easily used to create a numpy array
or a pandas DataFrame.
"""

llucax commented 2 months ago

One unrelated comment is I recommend using the typing.NamedTuple class, so you can add type hints to it too.

llucax commented 2 months ago

For me this is very hard to think about without knowing how do we plan to use this data. As long as we don't have a clear picture, I think the safest bet is to make the client retrieve the data as it comes in the API and then adapt it at another level. Without changing the protocol, we'll still have to do this transformation, and limiting what the user can do with the client artificially doesn't sound like a great idea.

cwasicki commented 1 month ago

I think the safest bet is to make the client retrieve the data as it comes in the API and then adapt it at another level.

Agree that there should be a way to retrieve the data as close as possible to the message definition. But I suggest to also keep the current approach as an alternative, since this format is already being used by multiple users and seems to be very useful. @llucax What is your idea of "another level"? A restricted client on top of the slim client? In the same repo?

llucax commented 1 month ago

What I mean I wouldn't make the adapted retrieval the only way to retrieve the data, maybe it is more clean in pseudo-code form:

adapted_data = await client.retrieve()  # Not for now, as we don't know if this will fit all use cases

adapted_data = adapt_data(await client.retrieve())  # Yes!

So if somebody needs to retrieve the data from the server in a non-adapted way (or adapt it differently) they can still use the client and don't have to get into the mess of using the raw bindings.

adapt_data() can be in the same repo for now, and I don't think it should be a whole new client for the adapted data if a simple function will suffice. We can see if it makes sense to split later, like we have with dispatch.

flora-hofmann-frequenz commented 1 month ago

@llucax: We were thinking of reusing some of the dataclasses in here. What is your take on that? Could add a _types.py module and then add/update the ListMicrogridComponentsDataResponse and add the adaptor (as mentioned above).

llucax commented 1 month ago

Sounds good in principle, I'm still not very involved in all the protobuf stuff in common that is shared between microgrid and reporting, so I feel I don't have a full picture to have a strong opinion, so please don't hold me accountable if I change my mind in the future when I get more involved :laughing: But again I think we are still at an exploration stage so I guess this is all fine, right?

flora-hofmann-frequenz commented 1 month ago

Sounds good to me! :+1: Let me keep looking into it and share as I go along for further discussions.

frequenz-floss / frequenz-client-common-python