finos / architecture-as-code

"Architecture as Code" (AasC) aims to devise and manage software architecture via a machine readable and version-controlled codebase, fostering a robust understanding, efficient development, and seamless maintenance of complex software architectures
https://calm.finos.org
Apache License 2.0
63 stars 38 forks source link

Generic CALM domain definitions via external schemas #310

Closed willosborne closed 3 months ago

willosborne commented 4 months ago

Feature Request


Description of Problem

In the last 6 months we have successfully built out the core of CALM to model architecture.

We are now seeing discussion around to model extra data in order to capture security, resiliency, observability (etc) in a CALM architecture.

We need a solution to attach additional data to CALM in a structured way, without growing the core manifest too much. In the same vein as our solution for interfaces (#48), this solution also needs to allows individual organisations to define their own domain information, extending the core set offered by the CALM metaschema.

Potential Solutions:

Create the concept of a domain object in CALM. A CALM document can optionally contain a map of domains. Each of these domains will then contain a list of structured objects, each of which decorates an object (node, relationship, interface etc.) in the document.

The data it annotates with will be defined by another JSON schema. CALM will provide a core set of basic extension types, and organisations can then provide their own.

NOTE: this will NOT replace metadata. Nor is this intended to capture the notion of when data is required - it just models the structure of the data that might be required. See final section for more detail here.

This is very much an early idea and needs some refinement, but wanted to see what people think.

Example: Resiliency domain

NOTE: this is a very simplified look at resiliency - I'm only considering vary basic properties. The idea is after all that CALM itself doesn't have an opinion about exactly what you want to define.

Here I'll be documenting the number of replica sets and deployment strategy for a system in the document. I'll first define this as an instantiation, and then again as a pattern to show how the types work.

See a very simple CALM pattern document:

{
    "nodes": [
        {
            "name": "API producer service",
            "unique-id": "api-producer",
            "interfaces": [
                // ...
            ]
        }
    ],
    "relationships": [
        // ...
    ],
    "domains": {
        "resiliency": {
            "nodes": [
                {
                    "element-id": "api-producer",
                    // custom properties defined by the external schema
                    "replica-sets": 4,
                    "deployment-strategy": "rolling-release"
                }
            ]
            // relationships, interfaces can also optionally be defined on a domain
            // they are defined separately so that each can have its own type in the schema
        }
    }
}

Note that rather than defining the domains on each element, we are defining them at the bottom and referencing elements by unique ID. This is more in keeping with the general approach and helps keep things flat.

I decided to keep nodes, relationships, interfaces separate; we can potentially combine this all into one list but it makes the types a bit nicer this way, since you can apply a single type to all elements in the list. i.e. all resiliency nodes have this base type, all relationships have another base type and so on.

If we wanted to also extend this document with say a data domain:

{
    //...
    "domains": {
        "resiliency": {
            "nodes": [
                // .. as before
            ]
        },
        "data": {
            "nodes": [
                {
                    "element-id": "api-producer",
                    // custom properties
                    "data-classification": "PII"
                }
            ]
        }
    }}

Modelling this in JSON Schema

The changes to the core CALM schema are small:

Then in your CALM pattern developers can reference the appropriate types via $ref - and then link out to their organisational-specific schemas. This lets them define a big list of potential properties in a way that doesn't bloat the core schema.

We can also provide some pre-defined domain types in CALM if need be.

This also means that the CALM CLI can generate, validate and visualize these properties, as long as they have the right schemas loaded. (This can be done via the option to select a schema directory.) This would allow developers to get a pre-populated starter instantiation with the properties required by the various domains inserted as placeholders, such as {{ REPLICA_SETS }} in our example. Tooling would then pick this up and report potentially missed values.

For some examples see the PR I raised here - #309 :

A note on requirements

I'm intentionally not considering the problem of deciding whether a certain element needs to specify certain domain properties. This is because the logic for making these decisions is way too complex for JSON schema.

e.g.

This is a problem for further down the line.

willosborne commented 4 months ago

@yt-ms has suggested an alternative way of decorating elements - by simply putting them in-line, in the same fashion as interfaces are defined on a node. I'll post an example of both with pros/cons so we can have a think about which we'd prefer. NB I've added a relationship here too to make it clearer.

Current proposal - decorated at the bottom, linked by unique-id

{
    "nodes": [
        {
            "name": "API producer service",
            "unique-id": "api-producer",
            "interfaces": [
                // ...
            ]
        }
    ],
    "relationships": [
        {
            "unique-id": "relationship-id",
            "relationship-type": {
                "connnects": {
                    // etc...
                }
            }
        }
    ],
    "domains": {
        "resiliency": {
            "nodes": [
                {
                    "element-id": "api-producer",
                    // custom properties defined by the external schema
                    "replica-sets": 4,
                    "deployment-strategy": "rolling-release"
                }
            ],
            "relationships": [
                {
                    "element-id": "relationship-id",
                    "uses-load-balancer": true
                }
            ]
        }
    }
}

Pros:

Cons:

Domain decorations applied in-line, directly on the object

{
    "nodes": [
        {
            "name": "API producer service",
            "unique-id": "api-producer",
            "interfaces": [
                // ...
            ],
            "domains": {
                "resiliency": {
                    "replica-sets": 4,
                    "deployment-strategy": "rolling-release"
                }
            }
        }
    ],
    "relationships": [
        {
            "unique-id": "relationship-id",
            "relationship-type": {
                "connnects": {
                    // etc...
                }
            },
            "domains": {
                "resiliency": {
                    "uses-load-balancer": true
                }
            }
        }
    ]
}

Pros:

Cons:

Budlee commented 4 months ago

Personally, I am a fan of the first approach. The domains are separate and contain what you need. Inline with the elements that I would say are first class, the relationship and nodes, I believe is harder to read and find.

With the first approach if you are an owner of a domain and it is referenced then the handling of that is straightforward. If you are a domain owner are there additional complexities if your requirements need to be added in the first class elements?

Also CALM is being consumed by applications, which is easier for a machine to process?

willosborne commented 4 months ago

@Budlee I agree, you can add everything in one place in the first approach.

Regarding machine processing, there isn't too much difference since it's all JSON; most of the challenge here is making sure you can parse the custom domain objects - ideally via codegen.

I'd say that in an untyped language, the second approach is easier to consume, since there are no lookups required.

In a typed language like Java, probably the first is easier, because you can parameterise the 'domains' property on your nodes/relationships with the custom types for your domains - and it's only parsed from a single place, rather than all across the whole document. But this is guesswork. (In general structured parsing of CALM documents is an interesting challenge that I haven't made too much progress with since there are fields like metadata, etc that are unstructured according to the core manifest. The CLI is not fully using static typing yet for this reason.)

jpgough-ms commented 4 months ago

@yt-ms and I have had an offline discussion today for the requirements for a minimal controls domain that we are looking at for some gating work. I plan to go with option 2 as the approach and creating a separate issue to put together this proposal in an August version of the Schema.