frequenz-floss / frequenz-sdk-python

Frequenz Python Software Development Kit (SDK)
https://frequenz-floss.github.io/frequenz-sdk-python/
MIT License
13 stars 17 forks source link

Error reporting when microgrid API is misconfigured #443

Open stefan-brus-frequenz opened 1 year ago

stefan-brus-frequenz commented 1 year ago

What happened?

We wanted to run some FCR prequalification tests on a set of batteries. These batteries had been moved to a separate instance of the microgrid API. When running the actor, we were unable to start, because we got the error "Component tree is not a graph!". It appears to be thrown from here: https://github.com/frequenz-floss/frequenz-sdk-python/blob/3565d731d4b64581ff95e91a40a5bc57e7ad2905/src/frequenz/sdk/microgrid/_graph.py#L705

Initially, we could not find anything wrong with the component graph itself, since we verified it using the connections endpoint of the microgrid API. It was then pointed out to us that there existed a component, which was not connected to any other components. After removing this component, the error went away.

What did you expect instead?

Ideally, the error reporting here should be more detailed, to avoid debugging sessions that lead to false conclusions. Something like component ID #X has no connections and is a dangling component would have saved us a lot of effort here.

Affected version(s)

v0.20.0

Affected part(s)

Core components (data structures, etc.) (part:core), Microgrid (API, component graph, etc.) (part:microgrid)

Extra information

No response

thomas-nicolai-frequenz commented 1 year ago

I suggest to not throw an error and have the entire SDK fail but rather throw a warning and ignore unconnected component(s). It can always happen that someone misconfigures the tree. However, there is a lot of problems with this because it might also be that we end up with two or more isolated trees. Hence, we have to handle a lot more potential error cases I'm afraid. This is tricky. Any thoughts anyone?

shsms commented 1 year ago

We use a 3rd-party library for building and verifying the graph, that doesn't seem to provide details of the sort, just a boolean is_tree method. We might have to do some custom graph traversal in the validation stage to give better error messages.

@thomas-nicolai-frequenz especially in cases where we have deployments that are completely isolated from the grid, and we can't identify a main tree through a grid component. But in location with a grid connection, maybe we can just keep the tree with the grid connection and print warnings about the rest and ignore them?

@stefan-brus-frequenz what kind of component was it that was dangling? was it a typo or was there a legitimate reason for it to be there and disconnected? could you give us some more context?

thomas-nicolai-frequenz commented 1 year ago

But in location with a grid connection, maybe we can just keep the tree with the grid connection and print warnings about the rest and ignore them?

yeah that to me would be step one. I would not yet worry about the scenario of not having a grid connection even it will be a known limitation for now.