UW-Macrostrat / macrostrat

A geological data platform for exploration, integration, and analysis
Apache License 2.0
3 stars 1 forks source link

Audit logging #59

Open davenquinn opened 1 month ago

davenquinn commented 1 month ago

As we start to plan for user-contributed changes to Macrostrat, it will become important to have reversible change logs for some tables.

Possible approaches

Build "versions" into the relevant tables/schemas

We could essentially make versioning a first-class part of Macrostrat's data model, by adding columns denoting versioning metadata to all relevant tables. The following should just about cover it:

This would allow lots of control, but continuous replacement of records would make maintaining foreign key relationships complicated. Since all table structures would have to change, we'd probably have to port all Macrostrat data to new schemas, and replace the current tables with views that mimic the current structure. Triggers may be needed to keep foreign keys 'fresh'

It will be possible to make this change while preserving current table structures in a view-only manner, but it may increase the complexity of Macrostrat's database storage system substantially, at least for tables like unit, strat_name_concept, legend, etc. which would likely all need to become versioned tables.

Use an external audit/logging trail system

There are many systems designed to create a (sometimes reversible) audit trail for PostgreSQL databases. As a rule these work by adding triggers to tables and dumping a JSON record encapsulating changes for every table operation on the targeted tables, keeping the actual record at the most current version. A few potential systems:

These systems generally have the advantage of presenting a single-timepoint schema, rather than lots of versioned tables. This will probably make interacting with Macrostrat's schema considerably easier.

Summary

Key features:

Maybe features:

General considerations:

Any input is appreciated!

cambro commented 1 month ago

This is an important consideration. I'm not sure it is necessary to implement full-on reversibility for all data in Macrostrat.

One option, that I tried to implement a bit in v1, is to have working and essentially final versions of columns. This was one of the main motivations for cols.status_code. This allows anything with the right code to be explicitly change-prone and "untracked" in that sense, some to be explicitly legacy, and some to be the "official" working version. This doesn't satisfy what you are talking about here with real versioning, but I fear implementing versioning could become a major resource sink that will get in way of rapidly getting to point where new data can flow.

davenquinn commented 1 month ago

I agree that full reversibility is not important. And this is definitely only relevant for a certain subset of tables. But I would argue it is incredibly important to at least have a sense of when things were changed, broadly what was updated, and by whom.

Figuring out how to have some level of change tracking without making the system unwieldy is an important balance to strike.

davenquinn commented 1 month ago

Topical: restoring the SGP database on my local machine reminds me that SGP uses pg_audit, apparently

image

This could be a good opportunity to understand that solution a little bit and potentially use something that's been widely adopted.