Closed prtandrup closed 1 year ago
In the spring of 2021 worked on another spike, with a similar purpose. That spike was dedicated to shed light on the possibilities and limitations of using the built-in Change Data Capture (CDC) feature of SQL-server: Spike: Fetch historical data from SQL using CDC The conclusion then, was that CDC did not fullfil our requirements, mainly due to the lack of support for refactoring: How model changes impact CDC
More relevant information regarding CDC:
Now, System-versioned temporal tables (Temporal Tables) could be a more suitable solution to the requirement we have identified, so far. This blog post gives valuable insights into the possibilities and limitations of Temporal Tables, including a comparison with CDC, showing that Temporal tables clearly are more suited for our use. It does, however, not show how Temporal tables cope with model refactorings.
Some Sql to demonstrate the use of Temporal tables: Creates a table and make several operations on both the datamodel and making CRUD opreations on the table:
IF (EXISTS (SELECT * FROM INFORMATION_SCHEMA.TABLES WHERE TABLE_NAME = 'Employee')) BEGIN -- DROP TABLE WTH VERSIONING ALTER TABLE [dbo].[Employee] SET ( SYSTEM_VERSIONING = OFF) DROP TABLE [dbo].[Employee] DROP TABLE [dbo].[EmployeeHistory] END GO
-- CREATE TABLE WITH VERSIONING CREATE TABLE dbo.Employee ( [EmployeeID] int NOT NULL PRIMARY KEY CLUSTERED , [Name] nvarchar(100) NOT NULL , [Position] varchar(100) NOT NULL , [Department] varchar(100) NOT NULL , [Address] nvarchar(1024) NOT NULL , [AnnualSalary] decimal (10,2) NOT NULL , [ValidFrom] datetime2 GENERATED ALWAYS AS ROW START HIDDEN , [ValidTo] datetime2 GENERATED ALWAYS AS ROW END HIDDEN , PERIOD FOR SYSTEM_TIME (ValidFrom, ValidTo) ) WITH (SYSTEM_VERSIONING = ON (HISTORY_TABLE = dbo.EmployeeHistory));
-- ADD COLUMN WITH 'NOT NULL' COLUMN ALTER TABLE dbo.Employee ADD NewColumn varchar(100) NOT NULL CONSTRAINT DF_NewColumn DEFAULT 'default value'
-- INSERT AND UPDATES SOME DATA INSERT INTO Employee (EmployeeId, Name, Position, Department, Address, AnnualSalary) values(0, 'Jan', 'Developer', 'Datahub', 'Vesterballevej 4', 1000)
UPDATE Employee SET AnnualSalary = 2000
UPDATE Employee set AnnualSalary = 3000
UPDATE Employee set Name = 'Idiot@work' where EmployeeID = 0
-- CREATE SOME COLUMNS ALTER TABLE dbo.Employee ADD ToBeDeleted varchar(100)
ALTER TABLE dbo.Employee ADD NewColumn2 varchar(100)
-- POPULATE WITH DATA INSERT INTO Employee (EmployeeId, Name, Position, Department, Address, AnnualSalary, ToBeDeleted, NewColumn2 ) values(1, 'Jan Duelund', 'Developer', 'Datahub', 'Vesterballevej', 1500, 'Initial data to be deleted', 'Initial data') UPDATE Employee set NewColumn2 = 'Updated data', ToBeDeleted = 'Delete me' WHERE EmployeeId = 1
-- ALTER COLUMN TO 'NOT NULL' ALTER TABLE dbo.Employee SET (SYSTEM_VERSIONING = OFF)
UPDATE EmployeeHistory SET NewColumn2 = '' WHERE NewColumn2 IS NULL UPDATE Employee SET NewColumn2 = '' WHERE NewColumn2 IS NULL
ALTER TABLE dbo.Employee SET (SYSTEM_VERSIONING = ON (HISTORY_TABLE = dbo.EmployeeHistory));
ALTER TABLE dbo.Employee ALTER COLUMN NewColumn2 varchar(100) NOT NULL GO
-- SHOW CURRENT DATA (BEFORE DROPPING COLUMN) SELECT FROM Employee SELECT FROM EmployeeHistory
-- DROP A COLUMN ALTER TABLE dbo.Employee SET (SYSTEM_VERSIONING = OFF) ALTER TABLE dbo.EmployeeHistory DROP COLUMN ToBeDeleted ALTER TABLE dbo.Employee DROP COLUMN ToBeDeleted ALTER TABLE dbo.Employee SET (SYSTEM_VERSIONING = ON (HISTORY_TABLE = dbo.EmployeeHistory)); GO
-- TRY TO MAKE AN UPDATE WITHOUT CHANGES (NO HISTORICAL ENTRY IS MADE) UPDATE Employee SET Position = 'Developer' WHERE EmployeeId = 1
-- SHOW CURRENT DATA SELECT FROM Employee SELECT FROM EmployeeHistory
-- DELETE A ROW DELETE FROM EMPLOYEE WHERE EmployeeId = 1
-- SHOW CURRENT DATA SELECT * FROM Employee
-- SHOW CURRENT DATA IN A SPECIFIC TIME USING FOR SYSTEM_TIME SELECT * FROM Employee FOR SYSTEM_TIME BETWEEN '2019-01-01 00:00:00.0000000' AND '2024-12-31 00:00:00.00' ORDER BY ValidFrom;
Temporal history is only recorded when and actual change has been made, so if this is a needed feature an extra field can be added "RowVersionId" to the table that will automatically be updated by the Sql server on every update without any further code and therefore trigger a history logging: (Add the rowversion column to the above "create table" and everything is logged/recorded)
-- CREATE TABLE WITH VERSIONING AND ROWVERSION CREATE TABLE dbo.Employee ( [EmployeeID] int NOT NULL PRIMARY KEY CLUSTERED , [Name] nvarchar(100) NOT NULL , [Position] varchar(100) NOT NULL , [Department] varchar(100) NOT NULL , [Address] nvarchar(1024) NOT NULL , [AnnualSalary] decimal (10,2) NOT NULL , [RowVersionId] rowversion NOT NULL , [ValidFrom] datetime2 GENERATED ALWAYS AS ROW START HIDDEN , [ValidTo] datetime2 GENERATED ALWAYS AS ROW END HIDDEN , PERIOD FOR SYSTEM_TIME (ValidFrom, ValidTo) ) WITH (SYSTEM_VERSIONING = ON (HISTORY_TABLE = dbo.EmployeeHistory));
I did som spike work with Temporal tables in Charges solution. Main conclusion is that it is possible to use, but would intervene our solution design in an unwantet manner:
Entity Framework Core has out-of-the-box support for Temporary Tables, but it only works for DbSet<>, not for IQueryable<>, which means we would have to replace IQueryable<> with DbSet<> in our query-stack which is undesirable. A custom solution is possible, but undesirable, as it requires implementation of an internal Microsoft API, which is unrecommended:
Using Temporary Tables for a entity with references is not supported, without Temporary Tables for each referenced tabel, i.e. for system versioning of ChargePeriod, EF Core requires system versioning of Charge, which requires system versioning of MarketParticipant. It creates a lot of undesired overhead:
System.InvalidOperationException: Temporal query is trying to use navigation to an entity 'Charge' which itself doesn't map to temporal table.
System.InvalidOperationException: Temporal query is trying to use navigation to an entity 'MarketParticipant' which itself doesn't map to temporal tab...
All-in-all, my recommendation is to not use Temporal Tables, and go with a custom solution based on servicebus, a history function and custom history table(s).
In my opinion. the AC's are fulfilled. The work continues with #1846
As a Volt Developer, I want an overview of the technology we need for persisting and delivering charge history data (information and prices) So that its' clear to everyone and we can start delivering business value
With this spike, we want to dive into different options and decide the technology stack we need to persist and deliver charge history data to the frontend
Considerations
Acceptance criteria
Work-in-progress
[x] The chosen solution must be feasible for both charge information and prices.
[x] The chosen solution should conform to the architectural runway, i.e. event-based state transfer and deliver data for the frontend through Web API.
[X] Reaching out to Titans; what have they used for MessageArchive?: CosmosDB is the choice for heavy data amounts, and for fast querying, but it is also an expensive solution that requires new knowledge. Cosmos DB handles schema changes very well by versioning documents.
[X] Reaching out to Batman; they have similar historic data needs as we do; what have they done in that regard? Batman looked into a solution based on event sourcing in an earlier hackathon, but so far they haven't met the requirement to implement history.
[x] Involve the architect(s): Reached out to @kerbou and presented our requirements for storing and presenting historical charge information. Based on the information at hand he recommended going with a solution based on SQL-server, given that it is known technology, and can be configured for many use cases using shards, partitions, and different levels of indexing, to optimize for write or read.
[x] In DataHub2, some market participants tend to submit a message per charge per day, even though there is no data change compared to what is in storage. Hopefully, this won't be the case in DH3.: We should expect the market participants to act as they do, even though we would like to receive messages with changes only. If this requirement persists, it leaves the SQL-server built-in Temporal tables unusable as it will only register changes to the main table.
Requirements for historic charge data
Technology choice
SQL-server Charge Data Capture is rather rigid and does not conform with our requirement to be able to refactor the command model
SQL-server System-versioned temporal tables (Temporal Tables) could be a more suitable solution to the requirements we have identified. Temporal tables cope with model refactorings provided that it is not a requirement that we can show the state before any change in the command model (e.g. a column removal). Blog post: EF Core and Temporal Tables Pros: it is an off-the-shelf solution built-in in SQL-server. Cons: To use it for our purpose, we would have to add a GUID or Timestamp column to our command model, and then update it every time we receive a charge. This compromises our separation of the command stack from the query stack, which is undesirable. UPDATE: See my comment regarding spike work with Temporal tables
CosmosDB: Cosmos DB is the choice for heavy amounts of data writes, and fast querying within Azure. Cosmos DB seamlessly handles schema changes using versioning of documents, which is another important advantage. It is, however, unknown technology for the team, and we would have to greatly extend our knowledge, which is time-consuming. Collective knowledge from Titans tells us that we should prepare for a lot of work if we choose Cosmos DB as technology for persisting historical charge data.
Event sourcing: Persisting charge history seems an obvious case for event sourcing. Event sourcing can be built using almost any persistence technology like SQL-server, Cosmos DB, or Table Storage