Configuration management in the sdk

ela-kotulska-frequenz commented 1 year ago

What's needed?

This is still in the discussion!

SDK and each actor has some config variables. This config should control how the application behaves. All config should be stored in single file. This file should be loaded at startup. If file change, sdk should read updates, validate them and send to the subscribed actors. We should assume that all actors uses marshmallow to create configs.

Proposed solution

Using marshmallow (+ apispec + marshmallow_dataclass). It is very convenient for validation, which will be necessary too, but also for documentation. We can then even export it automatically to openAPI and then render it nicely with swagger.

It is very likely OpenAPI should have good support in the UI part, so they can directly use the same config specification we produce.

Quick example of how an actor or user should declare the config (producing OpenAPI docs too):

import json
from typing import Optional
from dataclasses import field

import marshmallow_dataclass
import marshmallow.validate
from apispec import APISpec
from apispec.ext.marshmallow import MarshmallowPlugin

# Create an APISpec
spec = APISpec(
    title="Swagger Example",
    version="1.0.0",
    openapi_version="3.0.2",
    plugins=[MarshmallowPlugin()],
)

@marshmallow_dataclass.dataclass
class PeakShavingConfig:
    target_kw: float = field(default=5.0, metadata={  # First metadata is for marshmallow
        "validate": marshmallow.validate.Range(min=0.0, max=10.0),
        "metadata": {"description": "Peak shaving target (in kW)"},  # Second metadata is for apispec/OpenAPI
        })

@marshmallow_dataclass.dataclass
class EvChargingConfig:
    max_power_w: float = field(metadata={  # No default -> required
        "validate": marshmallow.validate.Range(min=0.0, max=10.0),
        "metadata": {"description": "Maximum allowed power for the site at the grid connection (in Watt)"},
        })
    min_power_w: Optional[float] = field(metadata={  # No default but optional -> NOT required
        "validate": marshmallow.validate.Range(min=0.0, max=10.0),
        "metadata": {"description": "Minimum allowed power for the site at the grid connection (in Watt)"},
        })
    data_gathering_duration_seconds: float = field(default=5.0, metadata={
        "validate": marshmallow.validate.Range(min=0.0, max=10.0),
        "metadata": {"description": "How long to gather data for forecast"},
        })
    update_interval_seconds: float = field(default=5.0, metadata={
        "validate": marshmallow.validate.Range(min=0.0, max=10.0),
        "metadata": {"description": "How often update the charge bounds based on the forecast"},
        })

spec.components.schema(PeakShavingConfig.__name__, schema=PeakShavingConfig.Schema)
spec.components.schema(EvChargingConfig.__name__, schema=EvChargingConfig.Schema)

print(json.dumps(spec.to_dict(), indent=4))

# Loading a (json) config file (this will throw an exception if any validation fails)
# ev_charging_config = EvChargingConfig.Schema().load(json.load(open('some_config.json')))
# print(ev_charging_config.max_power_w)

Output:

{
    "paths": {},
    "info": {
        "title": "Swagger Example",
        "version": "1.0.0"
    },
    "openapi": "3.0.2",
    "components": {
        "schemas": {
            "PeakShavingConfig": {
                "type": "object",
                "properties": {
                    "target_kw": {
                        "type": "number",
                        "default": 5.0,
                        "minimum": 0.0,
                        "maximum": 10.0,
                        "description": "Peak shaving target (in kW)"
                    }
                }
            },
            "EvChargingConfig": {
                "type": "object",
                "properties": {
                    "max_power_w": {
                        "type": "number",
                        "minimum": 0.0,
                        "maximum": 10.0,
                        "description": "Maximum allowed power for the site at the grid connection (in Watt)"
                    },
                    "data_gathering_duration_seconds": {
                        "type": "number",
                        "default": 5.0,
                        "minimum": 0.0,
                        "maximum": 10.0,
                        "description": "How long to gather data for forecast"
                    },
                    "update_interval_seconds": {
                        "type": "number",
                        "default": 5.0,
                        "minimum": 0.0,
                        "maximum": 10.0,
                        "description": "How often update the charge bounds based on the forecast"
                    },
                    "min_power_w": {
                        "type": "number",
                        "default": null,
                        "nullable": true,
                        "minimum": 0.0,
                        "maximum": 10.0,
                        "description": "Minimum allowed power for the site at the grid connection (in Watt)"
                    }
                },
                "required": [
                    "max_power_w"
                ]
            }
        }
    }
}

To run it locally: pip install -U marshmallow apispec[marshmallow] marshmallow-dataclass.

To see the swagger UI just copy & paste the output here: https://editor.swagger.io/

Screenshot of the UI for convenience:

We need

ConfigValidator : that should:
- take all schema as constructor argument
- has function validate(config_file: Dict[str, Any]) - to validate given config file. If config is ok it should create config objects for each given schema.
ConfigManager - actor to watch the configuration file and reacts on updates. We have one. It needs improvement: https://github.com/frequenz-floss/frequenz-sdk-python/blob/v0.x.x/src/frequenz/sdk/actor/_config_managing.py

How it should work

ConfigManager should be an actor that:

Take Schemas as constructor arguments. Schemas that should be in the config file.
Create channels using ChannelRegistry for each Schema. These channel will be used to send validated config objects.
Create ConfigValidator with given Schemas
Read config file, parse it using ConfigValidator.
- If file is correct - ConfigValidator will return config objects (one for each schema). We should send it using channel created in point 1.
- If file is not correct: (to discuss).
Watches for changes in config file, if file change go to point 4

Points to discuss

We need to have SharedConfigs (multi-level configs). How we do that?
https://github.com/frequenz-floss/frequenz-sdk-python/issues/65

Use cases

No response

Alternatives and workarounds

No response

Additional context

No response

ela-kotulska-frequenz commented 1 year ago

Next approach:

We should send all configs to each actor. All configs will be shared, and config can be 1) global 2) actor-level 3) local

leandro-lucarella-frequenz commented 1 year ago

I edited this issue to remove links to internal repos and chats, as they are not publicly visible. I also copied most of the proposed solution from the internal repo issue that was linked.

thomas-nicolai-frequenz commented 1 year ago

Something really important to have in mind. All config variables coming from the UI will be living in the same scope across all actors. Sometimes config variables might want to be shared across different actors. That also means there is no higher level order. Higher level order can, right now, only be achieve by using a prefix like prefix_. The UI supports key=>value or key => list(values) as of right now.

leandro-lucarella-frequenz commented 1 year ago

Yeah, we can still build some sort of hierarchy if we need to by using prefixes. So for now the input won't be JSON, but we can still use marshmallow to parse values and validate them, we'll just have build the var_name -> value dictionary ourselves to feed it to marshmallow load() function.

thomas-nicolai-frequenz commented 1 year ago

So for now the input won't be JSON

well the config will be written by cloud-sync to the local file system and it can be in whatever form but I guess YAML would be more convenient? We should also seperate between the config thats coming from the UI vs. the config variable configuration as part of an actor that the UI could pull in what config variables can be set. These are two different things to me. Does that make sense?

leandro-lucarella-frequenz commented 1 year ago

well the config will be written by cloud-sync to the local file system and it can be in whatever form but I guess YAML would be more convenient?

YAML is actually very complicated, as it can include anchors, messages, etc. Specially for something that will be read and written mainly by machines, I would go with JSON or TOML, which is the new hyped format :)

If you want more details about why YAML is probably not the best choice, you can have a look at https://noyaml.com/ :laughing:

We should also seperate between the config thats coming from the UI vs. the config variable configuration as part of an actor that the UI could pull in what config variables can be set. These are two different things to me. Does that make sense?

Not sure if I'm following you completely, but when using marshmallow it is very easy to produce a JSON schema that the UI could eventually use to automatically build forms to set configuration for actors by just rendering these JSON schemas. This is what you mean by "the config variable configuration as part of an actor that the UI could pull in what config variables can be set"?

leandro-lucarella-frequenz commented 1 year ago

This is actually a good argument for JSON (with TOML it is still very likely that a config file could be valid if truncated too).

Finally, there's a hidden trap which has caused terrible issues before. If your YAML config file gets truncated, because of an error during write or transmission, it's very likely that the resulting broken file is perfectly readable by Yaml. This is never true of JSON, for example.

https://tomswirly.medium.com/yaml-is-an-extremely-bad-choice-for-any-configuration-file-because-its-wildly-unpredictable-d37969d20fef

Of course we should always write to a temporary file and move only when the file is ready, but still, there could be corner cases where a file might end up being truncated if something is overlooked.

thomas-nicolai-frequenz commented 1 year ago

but when using marshmallow it is very easy to produce a JSON schema that the UI could eventually use to automatically build forms

What the UI will need is the name of the variable and maybe some description of the config variable and examples of what the value(s) might look like. How it works in the UI is a different matter.

thomas-nicolai-frequenz commented 1 year ago

This is actually a good argument for JSON (with TOML)

I don't mind if its JSON, TOML or YAML tbh.

leandro-lucarella-frequenz commented 1 year ago

What the UI will need is the name of the variable and maybe some description of the config variable and examples of what the value(s) might look like. How it works in the UI is a different matter.

That's all included in the JSON schema.

frequenz-floss / frequenz-sdk-python