SORMAS-Foundation / SORMAS-Project

SORMAS (Surveillance, Outbreak Response Management and Analysis System) is an early warning and management system to fight the spread of infectious diseases.
https://sormas.org
GNU General Public License v3.0
292 stars 140 forks source link

Data history for future reporting [2+X] #170

Closed ghost closed 7 years ago

ghost commented 7 years ago

@MartinWahnschaffeSymeda @StefanSzczesny @gstephan30 @stoikkanen @Krisistis

I have been brainstorming about the change history model for SORMAS Project so far. And I have come to the following conclusions that needs to be implemented. Two major thoughts:

  1. There is no clear separation in the data model between master data (i.e. attributes of an entity that are static and conceptually won’t change over its lifetime) and transactional data (i.e. dynamic attributes of an entity). This has at least two implications: Firstly, It is an explicit decision (involving our Nigerian colleagues and the evaluation team) has been made that it isimportant to store the history of these attributes. In many cases (e.g. RumorStatus, CaseStatus, ContactgroupStatus, …) it might be sufficient for purely operational purposes to only store the last value, but in order to evaluate the system, the full history of values might be needed. Secondly, do you intend to create a new entity every time a transaction takes place or refer to an entity's last update (or to another significant transaction changing one or more attributes)?

  2. I would like to draw your attention to a specific feature of the selected technology platform for SORMAS-N SAP HANA: history (column) tables which was used in the previous project. When editing entries in history tables, e.g. insert, update, delete, a full history of changes will be automatically stored. As a result, you can derive the complete history of changes and restore the full database table content at any point in time. An example can be found at [1]

[1] http://scn.sap.com/community/developer-center/hana/blog/2013/02/12/when-i-travelled-through-time-using-sap-hana

See a typical example:

How do you query the change history, producing the following table?

Key Value Hue Commit_Id A 1 - 200611 A 1 GREEN 200721 A 4 - 201039 B 2 - 200611 B 2 BLUE 200721 (B 2 BLUE 201039) C 2 - 200611 C 2 BLUE 200721 C 5 - 201039

So we can closely look at how to implement an "AS OF" Statement in a relational database maybe as a separate table as shown in the example below:

pastedgraphic-1 schema-02 (3).pdf

We need to discuss the best way to do this as we cant run away from this function. Else it would be impossible to do any evaluation of the system. Martin suggested for the pilot to use a Data warehouse which is also an option but we need to activate this for the prototype as well.

Lets think through this. I believe there is way

Cheers

Some helpful links http://stackoverflow.com/questions/3874199/how-to-store-historical-data https://www.simple-talk.com/sql/database-administration/database-design-a-point-in-time-architecture/ https://www.visual-paradigm.com/tutorials/databasedesign.jsp

MartinWahnschaffe commented 7 years ago

All non-infrastructure data (cases, contacts, visits, events, tasks, etc.) from the operational database will be imported into another database (data warehouse) in fixed time intervals (e.g. daily).

The imported data is marked with a timestamp and can thus be used to compare changes in data over time. The collected data can later be restructured and aggregated (e.g. star schema for online analytical processing) to be used for a reporting system.

Note: This does not include a reporting component or other user interface to access the data warehouse.

MartinWahnschaffe commented 7 years ago

First step:

MartinWahnschaffe commented 7 years ago

Some notes on pentaho that might be helpful later (http://community.pentaho.com/)

Business Analytics Platform (pentaho-server)

Allows to create datasources and even define olap schemas Uses jpivot (old?) and suggests to use "Pentaho Analyzer" // unclear what this is

Data Integration (kettle/spoon)

Write ETL - useful to create a data warehouse

Report Designer

Can create paper reports from querries // not helpful for us

Postgres Temporal Tables

The temporal table extension for postgres could be a good starting point: http://clarkdave.net/2015/02/historical-records-with-postgresql-and-temporal-tables-and-sql-2011/ https://github.com/arkhipov/temporal_tables

Versioning for relational databases is part of the SQL 2011 specification and currently available in IBM DB2 Orcale 12c and MS SQL Server 2016 (https://en.wikipedia.org/wiki/SQL:2011)

MartinWahnschaffe commented 7 years ago

I have successfully tested the PostgreSQL Temporal Tables addon. It creates a full history of all data changes and allows quite easy querries to get the state at a certain time. This is a very good base for the creation of reporting components and if needed a data warehouse. The SORMAS pilot will show us how much storage and performance this feature will cost us.

Here are the steps that need to be done when the server gets an update to the latest version of sormas. (see https://github.com/arkhipov/temporal_tables):

Afterwards just execute the latest additions made to sormas_schema.sql.

Note: Support for SQL 2011 querries does not exist yet. Instead use (or create a view): SELECT FROM cases WHERE sys_period @> '2017-04-10'::timestamptz UNION ALL SELECT from cases_history WHERE sys_period @> '2017-04-10'::timestamptz;

MartinWahnschaffe commented 7 years ago

@MateStrysewskeSym I'm not sure if everything needed to install temporal tables is explained perfectly. It's probably best to wait with the pull until tomorrow afternoon.