Avanade / NTangle

nTangle is a Change Data Capture (CDC) code generation tool and corresponding runtime. Unlike other CDC-based technologies which replicate changes to rows, nTangle is designed to replicate business entity (aggregate) changes.
MIT License
15 stars 3 forks source link
cdc change-data-capture



Logo


Introduction

nTangle is a Change Data Capture (CDC) code generation tool and corresponding runtime. Unlike other CDC-based technologies which replicate changes to rows, nTangle is designed to replicate business entity (aggregate) changes.

For example, if a database contains a Person and one-to-many related Address table, a traditional CDC replicator would leverage the CDC-capabilities of the database as the data source and replicate all changes from both tables largely distinct from each other. Additional logic would then be required within the downstream systems to aggregate these distinct changes back into a holistic business entity where required, if possible.

nTangle tackles this differently by packaging the changes at the source into an aggregated entity which is then replicated. With nTangle the CDC-capabilities of the database are leveraged as the trigger, with a corresponding query across all related tables to produce a holistic business entity. Therefore, if a change is made to Person or Address this will result in the publishing of the entity. Where transactional changes are made to both Person and Address a single holistic business entity will be published including all changes.

This has a key advantage of being an excellent candidate within event-streaming scenarios where business entities are to be published based on underlying database changes.


Demonstration

The following video provides a high-level demonstration of nTangle and its capabilities.

https://github.com/Avanade/NTangle/assets/12836934/2894d753-d5b7-4e2a-bc6d-fbf41289027f


Status

CI NuGet version Coverage Status

The included change log details all key changes per published version.


Approach

The nTangle CDC approach taken here is to consolidate the tracking of individual tables (one or more) into a aggregated entity to simplify the publishing to an event stream (or equivalent). The advantage of this is where a change occurs to any of the rows related to an entity, even where multiples rows are updated, this will only result in a single event. This makes it easier (more logical) for downstream subscribers to consume.

This is achieved by defining (configuring) the entity, being the primary (parent) table, and its related secondary (child) tables. For example, a SalesOrder, may be made up multiple tables - when any of these change then a single SalesOrder event should occur. These relationships are also defined with a cardinality of either OneToMany or OneToOne.

SalesOrder             // Parent
└── SalesOrderAddress  // Child 1:n - One or more addresses (e.g. Billing and Shipping)
└── SalesOrderItem     // Child 1:n - One or more items

The CDC capability is used specifically as a trigger for change (being Create, Update or Delete). The resulting data that is published is the latest, not a snapshot in time (CDC captured). The reason for this is two-fold:

  1. Given how the CDC data is batch retrieved there is no guarantee that the CDC captured data represents a final intended state suitable for publishing; and,
  2. This process is intended to be running near real-time so getting the latest version will produce the most current committed version as at that time.

To further guarantee only a single event for a specific version is published the resulting entity is JSON serialized and hashed; this value is checked (and saved) against the prior version to ensure a publish contains data that is actionable. This will minimize redundant publishing, whilst also making the underlying processing more efficient.


Change-data-capture (CDC)

This official documentation describes the Microsoft SQL Server CDC-capabilities.

Although throughout references are made to Microsoft SQL Server, the intention of nTangle is that it is largely agnostic to the database technology, and therefore support for other databases will (or may) be supported in the future based on demand, and their capabilities.


Architecture

The NTangle Microsoft SQL Server underlying architecture is described here.


Capabilities

nTangle has been created to provide a seamless means to create CDC-enabled aggregated entity publishing solution. The nTangle solution is composed of the following:

  1. Code generation - a configuration file defines the database tables, none or more relationships, and other functionality-based properties, that are used to drive the database-driven code-generation to create the required solution artefacts.
  2. Runtime - the generated solution artefacts leverage a number of .NET runtime components/capabilities to support and enable. The code-generated solution then uses these at runtime to execute and orchestrate the CDC-triggered aggregated entity publishing process.


Code-generation

The code-generation is managed via a console application using the CodeGenConsole to manage. This internally leverages OnRamp to enable the underlying code-generation capabilities.

Additionally, the code-generator inspects (queries) the database to infer the underlying table schema for all tables and their columns. This is used as a source in which the configuration references to validate, whilst also minimizes configuration where the inferred schema information can be used. The code-generation adopts a gen-many philosophy, therefore where schema changes are made, the code-generation can be executed again to update accordingly.

As stated, the code-generation is driven by a configuration file, typically named ntangle.yaml. Both YAML and JSON formats are supported; there is also a corresponding JSON schema to enable editor intellisense, etc.

The nTangle configuration is as follows:

Root
└── Table(s)
  └── Join(s)
    └── JoinOn(s)
    └── JoinMapping(s)
  └── TableMapping(s)

Documentation related to each of the above are as follows:

An example ntangle.yaml configuration file exists within the SqlServerDemo sample. The SqlServerDemo.CodeGen sample also demonstrates how to invoke the code generator from the underlying Program.

The code-generator will output a number of generated artefacts; these will be either database-related (see SqlServerDemo.Database sample) or corresponding .NET runtime components (see SqlServerDemo.Publisher sample).

The following NTangle namespaces provide the code-generation capabilties:

Namespace Description
Config The internal capabilities that support the YAML/JSON configuration.
Console The code-generation tooling capabilities, primarily CodeGenConsole.
Generators The internal code-generators used to select configuration for one or more Templates as orchestrated by the underlying Scripts.


Runtime

Generally, a runtime publisher is required to orchestrate the CDC-triggered aggregated entity publishing process (see SqlServerDemo.Publisher sample). This in turn takes a dependency on the nTangle runtime to enable.

The following NTangle namespaces provide the runtime capabilties:

Namespace Description
Cdc CDC-orchestration capabilities, primarily EntityOrchestrator.
Data Database access capabilities to support the likes of batch tracking, identifier mapping and versioning.
Events Event capabilities, leveraging and extending the capabilities enabled by CoreEx.
Services Service hosting capabilities, primarily the HostedService.


Additional documentation

The following are references to additional documentation.


Samples

The following samples are provided to guide usage:

Sample Description
SqlServerDemo A sample as an end-to-end solution to demonstrate the usage of nTangle against a Microsoft SQL Server database. However, the best place to follow along and learn is to use the NTangle.Template tool - instructions are within to guide end-to-end setup and execution.


Tooling

The following tools are provided to support development:

Sample Description
NTangle.Template This is the .NET template used to accelerate the creation of an nTangle solution and all projects using dotnet new. This leverages the .NET Core templating functionality.
NTangle.ArtefactGenerate.Tool This in an internal tool used for nTangle development that provides a means to auto-generate the corresponding JSON Schema and markdown documentation from the related .NET configuration entities.


License

nTangle is open source under the MIT license and is free for commercial use.


Contributing

One of the easiest ways to contribute is to participate in discussions on GitHub issues. You can also contribute by submitting pull requests (PR) with code changes. Contributions are welcome. See information on contributing, as well as our code of conduct.


Security

See our security disclosure policy.


Who is Avanade?

Avanade is the leading provider of innovative digital and cloud services, business solutions and design-led experiences on the Microsoft ecosystem, and the power behind the Accenture Microsoft Business Group.