Blacksmoke16 commented 10 months ago

Configuration Athena Framework

Athena has a somewhat long history when it comes to configuration. First there was athena.yaml, which was replaced shortly after via ACF::Base and ACF::Parameters. Each approach used the shortcomings of the previous version to improve upon the overall goal of making configuration in Athena simple and powerful.

Current State

Current current state of configuring Athena Framework looks like:

# Define a record to contain the values
record AppParams,
  enable_v2_protocol : Bool = false

# Make the config component aware of them
class Athena::Config::Parameters
  getter app = AppParams.new
end

# Enable/configure CORS listener
def ATH::Config::CORS.configure : ATH::Config::CORS?
  new(
    allow_credentials: true,
    allow_origin: %w(https://app.example.com),
    expose_headers: %w(X-Transaction-ID X-Some-Custom-Header),
  )
end

However this latest attempt is not perfect:

Requires a lot of boilerplate
- Need to define quite a lot of types in order to set things, most of which are just there to define the naming of the values
Requires monkey patching as a way to make Athena aware of the custom configuration/parameters
- Ideally we wouldn't have to depend upon this pattern
The service container isn't aware of the values themselves
- Unable to use the values to configure the container itself
Configuration is cached at startup
- Unable to change ENV vars without restarting the server

However the current approach does have some nice features:

No extra concepts since it's all just native types and methods
Compile time errors if an unexpected value, or incorrect type is used
Self documenting via crystal docs

Proposed State

The primary goal/feature of the new setup is to enable the DI component to be aware of the configuration/parameter values. In this way it is also to make the concept of https://athenaframework.org/DependencyInjection/Register/#Athena::DependencyInjection::Register--configuration obsolete. Since instead of needing to pass some obj to the service, it would be able to set/replace the constructor arguments directly.

It would also enable determining what services the container has. E.g. if the user never configures CORS or content negotiation, we can just remove those related listeners entirely, instead of just no-oping. This most obviously comes with a performance/efficiency benefit due to there being less services in the container.

The new proposed way would be something like:

ADI.configure({
  framework: {
    cors: {
      defaults: {
        allow_credentials: true,
        allow_origin:      ["https://app.example.com"] of String,
        expose_headers:    ["X-Transaction-ID", "X-Some-Custom-Header"] of String,
      },
    },
  },
  parameters: {
    "app.enable_v2_protocol": false,
  },
})

The custom types, monkey patching, and such are all replaced by a NamedTuple passed to a dedicated ADI.configure macro. Top level keys are used to differentiate parameters, from configuration; while the configuration are able to use parameters via the same %app.enable_v2_protocol% syntax. Additional top level "schemas" may also be defined, such as for integrating your own, or third party types/features.

Each top level key has a "schema" that describes the structure (expected type, default value) of the configuration in such a way it will be viewable within the API docs.

Blacksmoke16 commented 9 months ago

Okay, getting close to having a initial implementation of this so wanted to give an update on the current state of things. As well as document for posterity my thinking of how all this will work.

Bundles

The change in how configuration is handled will coincide with with the introduction of a new concept; that of bundles. The exact implementation with regard to how bundles will work is somewhat still up in the air. But what is known, is what bundles are, and how they will fit into the ecosystem.

It should be well known by now that the components that make up Athena's ecosystem are independent and usable outside of the Athena Framework itself. However because they are made with the assumption that the entire framework will not be available, there has to be something that provides the tighter integration into the rest of the framework that makes it all work together so nicely. An example of this is mainly enabling dependency injection, registering services based on the various annotation/interfaces defined by each component, or what we'll be focusing on today, how all these services should behave.

Longer term, the bundle concept could extend to third party shards. E.g. a shard that requires some third-party shard, wires some DI things up, defines some services of its own, or some event listeners. This shard can then be required into an Athena Framework project and easily integrated/configured without having to handle the integration yourself.

Schemas

As mentioned in the OP, the new implementation of configuring Athena will expose those values at compile time. This makes the whole process a whole heck of a lot more powerful. But, while having a singular way to provide your configuration values, there still needs to be a way to know what values are possible. This is where a bundle comes into play. Each bundle is responsible for defining a "schema" that represents the possible configuration properties that relate to the services provided by that bundle, while still being type safe and self-documenting.

The bundle will also provide an "extension" type that handles configuring its services based on the provided configuration values. This side of things is not going to be part of the public API to start, but as things get more refined it could be opened up to those wanting to integrate a set of services more deeply into the framework, or advanced users wanting to be able to extend the framework to exactly fit their needs.

[!IMPORTANT] Bundles themselves are not something the average end user is going to need to define/manage themselves outside of registering the ones they need and configuring them as they wish. What is listed below is included for transparency and posterity.

For purposes of this RFC, only the required bits are mentioned, and is not representative of what the actual bundle implementation will look like. With that said a portion of the framework bundle that was configured above could look something like:

@[ATH::Bundle("framework")]
# Provides a tight integration between the various Athena components and the Athena framework.
struct Athena::Framework::Bundle < ATH::AbstractBundle
  # Represents the possible configuration properties, including their name, type, default, and documentation.
  module Schema
    include ATH::Bundle::Configuration

    # Configured how `ATH::Listeners::CORS` functions.
    # If no configuration is provided, that listener is disabled and will not be invoked at all.
    module Cors
      include ATH::Bundle::Configuration

      # CORS defaults that affect all routes globally.
      module Defaults
        include ATH::Bundle::Configuration

        # Indicates whether the request can be made using credentials.
        #
        # Maps to the access-control-allow-credentials header.
        property? allow_credentials : Bool = false

        # A white-listed array of valid origins. Each origin may be a static String, or a Regex.
        #
        # Can be set to ["*"] to allow any origin.
        property allow_origin : Array(String) = [] of String

        # Array of headers that the browser is allowed to read from the response.
        #
        # Maps to the access-control-expose-headers header.
        property expose_headers : Array(String) = [] of String
      end
    end
  end
end

# At some point register this extension with the DI component.
# This is manual as to give control to which extensions are enabled.
# E.g. may only want to have a bundle active in dev, but not release builds.
ATH.register_bundle ATH::Bundle

As mentioned, the bundle type defines the schema within a nested Schema module. Each module includes the ATH::Bundle::Configuration which exposes a custom property and property? macro to make it feel more natural.

The bundle is overall self-documenting as you would any other Crystal type. Once https://github.com/crystal-lang/crystal/issues/14039 is resolved, each configuration property will include the doc comments defined on it, as well as a line denoting its default value if any. These will be made available within the API docs as abstract methods on the module.

Lastly the module needs to be registered. The framework bundle as pictured here will be automatically registered. But third-party/optional bundles may need to be handled manually.

Schema Validation

Having type safe configuration is an important goal of mine and I think i did a pretty good job in this regard. The schema module not only acts as a source of documentation, but the user provided configuration is validated against it. For example, if you were to provide allow_credentials a non boolean, you'd get:

 246 | allow_credentials: 10,
                          ^
Error: Expected configuration value 'framework.cors.defaults.allow_credentials' to be a 'Bool', but got 'Int32'.

Which includes helpful information about the related configuration property, while pointing at the invalid value in the trace. This also works for nested values:

 247 | allow_origin:      [10, "https://app.example.com"] of String,
                           ^
Error: Expected configuration value 'framework.cors.defaults.allow_origin[0]' to be a 'String', but got 'Int32'.

Or if the schema defines a value that is not nilable nor has a default:

 227 | property default_locale : String
                ^-------------
Error: Required configuration property 'framework.default_locale : String' must be provided.

It can also call out unexpected keys:

 245 | foo:      "bar",
                 ^
Error: Encountered unexpected property 'framework.cors.foo' with value '"bar"'.

Which expanded to:

 > 1 | macro finished
             ^-------
Error: Extension 'biz' is configured, but no extension with that name has been registered.

Some errors, like this last one, do not yet point to the correct node. This will be resolved over time as location info is fixed within the compiler.

Hash configuration values are unchecked so are best used for unstructured data, so if you have a fixed set of related configuration, consider using a named tuple instead. This way it'll be type safe, and error if a required value (non-nilable) was not provided.

@[ATH::Bundle("example")]
struct MyBundle < ATH::AbstractBundle
  property connection : NamedTuple(hostname: String, username: String, password: String, port: Int32)

  # ...
end

# ...

ATH.configure({
  example: {
    connection: {
      hostname: "my-db",
      username: "user",
      password: "pass",
    },
  },
})

# Error: Configuration value 'example.connection' is missing required value for 'port' of type 'Int32'.

Call nodes are also supported, but like hashes, are not type checked. Bundles may use this, but is responsible for its own validation/integration within the extension code itself. Because of this, sticking with literal types is usually preferred.

require "uri"

@[ATH::Bundle("example")]
struct MyBundle < ATH::AbstractBundle
  property default_url : URI

  # ...
end

ATH.configure({
  framework: {
    default_url: URI.parse("google.com"),
  },
})

As with any new code, I think I handled most use cases I could think of. But if you run into something you think just isn't working as you expect. Please let me know.

Multi-Environment

In most cases, the configuration for each bundle is likely going to vary one environment to another. Values that change machine to machine should ideally be leveraging environmental variables. Of which, a new and improved way of handling them in this new system is forthcoming. However, there are also cases where the underlying configuration should be different. E.g. locally use an in-memory cache while using redis in other environments.

To handle this, ATH.configure may be called multiple times, with the last call taking priority. The configuration is deep merged together as well, so only the configuration you wish to alter needs to be defined. However hash/array/namedTuple values are not. Normal compile time logic may be used to make these conditional as well. E.g. basing things off --release or --debug flags vs the environment.

ADI.configure({
  framework: {
    cors: {
      defaults: {
        allow_credentials: true,
        allow_origin:      ["https://app.example.com"] of String,
        expose_headers:    ["X-Transaction-ID", "X-Debug-Header"] of String,
      },
    },
  },
})

# Exclude the debug header in prod, but retain the other two values
{% if env(Athena::ENV_NAME) == "prod" %}
ADI.configure({
  framework: {
    cors: {
      defaults: {
        expose_headers:    ["X-Transaction-ID"] of String,
      },
    },
  },
})
{% end %}

# Do this other thing if in a non-release build
{% unless flag? "release" %}
ADI.configure({...})
{% end %}

I could see the skeleton/demo application setting up something like:

# config.cr

# Define base configuration
ADI.configure({...})

# Apply overrides based on env
{% if env(Athena::ENV_NAME) == "staging" %}
require "./staging_config"
{% elsif env(Athena::ENV_NAME) == "test" %}
require "./test_config"
{% else %}
require "./local_config"
{% end %}

This way things stay pretty organized, but still being flexible for supporting different environments w/o having large conditional logic blocks.

Macro Interpolation

Configuration may also be calculated at compile time as somewhat of a preprocess step. However in order for extensions to have access to the calculated value, the ATH.configure call must be wrapped in macro {% begin %}/{% end %} block.

{% begin %}
ADI.configure({
  framework: {
    id: {{123 + 456}},
  },
})
{% end %}

This way, extensions looking to use the framework.id will have access to the proper 579 value, versus an unresolved MacroExpression.

Blacksmoke16 commented 9 months ago

It should be well known by now that the components that make up Athena's ecosystem are independent and usable outside of the Athena Framework itself. However because they are made with the assumption that the entire framework will not be available, there has to be something that provides the tighter integration into the rest of the framework that makes it all work together so nicely. An example of this is mainly enabling dependency injection, registering services based on the various annotation/interfaces defined by each component, or what we'll be focusing on today, how all these services should behave.

Expanding on this, I realized that the current behavior is that if the DI component component is installed with certain other components, say clock, then the DI component is able to know that and configure things for just the particular component. E.g. define a clock service automatically just because they're installed together.

I thought about this for a while and ultimately think it would be best to remove this "feature" in favor of having clearer boundaries. I.e. make it so the framework component (bundle) is the sole place that the integration logic occurs/lives.

An argument could be made to move the integration logic into each component itself and require the user to manually wire things up, via somewhat internal features that are not yet quite ready for public use. But then other things get more complicated in regards to how things are used/versioned/released/tested.

Ultimately what this ends up meaning is:

Remove the ext/ directory in the DI component
Move those compiler passes into the framework component
Do a pass on the ext/ directory in the framework component to see what can be consolidated/moved around.

Blacksmoke16 commented 7 months ago

One thing to consider for a future iteration is moving away from the TypeDeclaration approach in favor of a more custom DSL. E.g.

module OptionTwo
  include ADI::Extension::Schema

  bool enabled = true
  string name
  array_of rules, username : String, password : String, charset : String = "UTF-8"
end

The main benefit of this is it would allow including default values within the context of anonymous objects (NamedTuples). Would require some more custom macros, but given extension schemas are really only meant to handle primitive types, there honestly wouldn't be all that many.

bool
string
int (type as 2nd arg)
bigint (helper for Int64?)
float (similar to int)
array_of (special type representing arrays a primitive type, or object like (NT) values)

May give this a try implementation wise and see how it goes...

athena-framework / athena

Configuring Athena Framework #332