ArroyoSystems / arroyo

Distributed stream processing engine in Rust
https://arroyo.dev
Apache License 2.0
3.44k stars 188 forks source link

Add a new config system #638

Closed mwylde closed 1 month ago

mwylde commented 1 month ago

This PR introduces a new, hierarchical configuration system for Arroyo, replacing the ad-hoc environment variable system currently in place. This will make it easier to configure the system, understand the available configuration options, and add new ones.

An example config file looks like this:

checkpoint-url = "s3://my-bucket/checkpoints"

[controller]
scheduler = "node"
rpc-port = 9292

[pipeline]
source-batch-linger = "1s"

The configuration file can be specified by passing --config to the binary, or by placing the config file in $(user-config-dir)/arroyo/config.toml (for example, on Linux this is ~/.config/arroyo/config.toml, on MacOS ~/Library/Application\ Support/arroyo).

Configurations can be overridden with environment variables with the prefix ARROYO__. To convert a toml config key to env var, paths (.) are replace with double-underscores (__) while hyphens are replaced with single underscores (_). So for example, the key controller.rpc-port would be expressed as ARROYO__CONTROLLER__RPC_PORT.

Most existing env var configs are also supported (like CHECKPOINT_URL or DATABASE_NAME) but will produce a warning if used, and support will be removed in 0.12. The helm config format is unchanged.