finos / architecture-as-code

"Architecture as Code" (AasC) aims to devise and manage software architecture via a machine readable and version-controlled codebase, fostering a robust understanding, efficient development, and seamless maintenance of complex software architectures
https://finos.github.io/architecture-as-code/
Apache License 2.0
46 stars 19 forks source link

Resiliency by Design – Capture Resiliency Features as part of CALM #200

Open develontopia opened 3 months ago

develontopia commented 3 months ago

Feature Request

Develop CALM to capture those aspects of resiliency that are decided or influenced by architecture designs choices.

Description of Problem:

Designing systems for resiliency is a complex endeavour.

While it is easy to find literature on resiliency techniques and considerations, there seems to be a lack of practical ways of effectively applying resiliency considerations to architecture designs.

Potential Solutions:

CALM offers the opportunity to capture and persist resiliency considerations as part of system architecture designs and subsequent implementations and can develop into more:

• A structured, practical, and scalable guide for resiliency design. • Templated resiliency design options.

Leading to better resiliency capabilities: • Compare & contrast different resiliency design choices. • Development and identification of resiliency design patterns. • Improved resiliency measures. • Targeted resiliency testing.

Next Steps

Create a Framework to articulate Resiliency Requirements

Before a system can be declared "resilient", there needs to be an understanding of what the benchmark is - ideally expressed as a clear set of requirements that need to be met.

Here an outline of a potential framework to capture resiliency requirements:

Definitions

Scope

Taxonomy

Resiliency Requirements Framework (Example)

System Rating
Requirement Type Requirement 1 2 3
Policy The system has a clear (stakeholder agreed) defintion of the minimum acceptable level of service. Must Have Must Have Optional
Implementation The acceptable level of service definition is expressed quantifiably in terms of availability, latency, performance and integrity requirements. Must Have Must Have Optional
Policy The system has a clear (stakeholder agreed) definition of RPO Must Have Must Have Optional
Policy The system has a clear (stakeholder agreed) definition of RTO. Must Have Must Have Optional
Policy The system is portable to run on different platforms and vendor services. Must Have Optional Optional
Policy The system maintains back-ups of all critical data points. Must Have Must Have Must Have
Implementation Data Back-ups are taken every X hrs. Must Have (X=2) Must Have (X<8) Must Have (X<400)

Propose a standard set of Resiliency Requirements Definitions

This working group could/should propose standard resiliency requirement definitions to pick and chose from.

charleyalpha789 commented 2 weeks ago

@rocketstack-matt @develontopia , happy to contribute ideas to this domain. let me know if there are separate discussions scheduled.