History-Research-Environment / HRE--History-Research-Environment

Main repo for HRE code
https://historyresearchenvironment.org/
GNU Affero General Public License v3.0
32 stars 6 forks source link

Database Primary Keys #69

Closed MichaelErichsen closed 4 years ago

MichaelErichsen commented 6 years ago

Identifying primary key fields

Consistency between TABLE_NAME_PID's and fields in other tables referencing them (INTEGER - SMALLINT).

RobinLamacraft commented 6 years ago

Hi Michael,

Probably this comes from my back ground where you only allocated larger integer storage in places where you knew that a value could become large.

I tried to keep the head fields of all tables consistent. As some tables could have many, many records so the table PID in each record's left set of fields is always an INTEGER

However there are many tables that will never exceed the upper bound of the SMALL_INT (32767 records). For external references to those PID in other tables we had adopted the convention of using SMALLINT for those referencing fields knowing that particular reference would always fit in that value.

BUT I can now see that it was foolish as it creates inconsistency in the definition of some classes. *

MY ERROR!! - I keep forgetting that storage is now cheap. *

Therefore there are 2 things that need to be done: (1) make external reference to be consistently INTEGER (2) get Rod and Don to help identify if they can which fields need to be changed in the DDL.

Please discuss the best way to overcome this blunder.

The second part of this is to for discuss to discuss a way of preserving the user view able audit trail of recent changes for each commit. Each table record stores the last commit ID that modified that record. We just need to work out how best to achieve this. I can think of at least one way that would achieve user visible audit trail  and which would then mean that the record PID becomes the primary key for all active tables. In fact this proposal may also make the management of change logs much simpler. I will think a bit more and come back with a proposal in a short while.

Robin

On 05-Aug-18 9:24 PM, Michael Erichsen wrote:

Identifying primary key fields Consistency between

_PID and fields in other tables referencing them (INTEGER - SMALLINT). Other issues.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/History-Research-Environment/HRE--History-Research-Environment/issues/69, or mute the thread https://github.com/notifications/unsubscribe-auth/AVeLtK_eD4uK5lu38VUyzgAoJ9Q1WzBtks5uNt0GgaJpZM4VvWRS.

-- Robin Lamacraft, Adelaide, Australia

MichaelErichsen commented 6 years ago

Fine.

Yes, there were days, when 16 K of memory was delivered on a truck :-) I'll change everything I meet to Integer.

I haven't looked at the commit logs yet, but I am sure that we can find a solution.

Mvh Michael Erichsen

RobinLamacraft commented 6 years ago

Thanks Michael,

Let me make a simple suggestion for the solution of the audit trail commit logging. My previous proposal was to embed that now obsolete data records as additional records in the same table as it was edited. Then, later for those obsolete records to be purged from time to time by a separate management process.  They would be identified as groups indexed by the Commit ID.

At no real cost we could use a similar method by creating a second database OR more tables in the current HRE schema with tables that effectively copied the record that was about to be made obsolete to a record in its "logging" version of that table. This would be just a record copy including the Commit ID. So that would be a new table with its own PID for records. It would be easy to construct and easy to use to back out some changes in their reverse commit order. Effectively a minor adjustment to SQL of the table that it was its logging partner table.

Using this process, then all the HRE table record PIDs can now be used as the primary key.

That should improve the database access efficiency. Hence this would mean that apart from the record copying when a record was updated, the management of the Logging trail can be parceled into another plugin. By copying record for record there would be no need to convert field values to text strings, etc (some conversions can loose some precision for some numeric values if you then do a redo.

I would like to hear your opinion about this. It roughly doubles the number of tables in the schema, but the extra tables are likely to have few records.

Robin

On 05-Aug-18 11:45 PM, Michael Erichsen wrote:

Fine.

Yes, there were days, when 16 K of memory was delivered on a truck :-) I'll change everything I meet to Integer.

I haven't looked at the commit logs yet, but I am sure that we can find a solution.

Mvh Michael Erichsen

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/History-Research-Environment/HRE--History-Research-Environment/issues/69#issuecomment-410523067, or mute the thread https://github.com/notifications/unsubscribe-auth/AVeLtHLNLh0a9Qu6gIdAK97ChqDt441Fks5uNv4RgaJpZM4VvWRS.

-- Robin Lamacraft, Adelaide, Australia

RobinLamacraft commented 6 years ago

Hi Michael,

There implications of this use of the PID fields and the revision of logging changes by a commit-based trail that need to be adjusted in documentation already released to GitHub.

Can we get an agreement on the plan of action so that the documentation can be brought inline with these decisions?

Robin

On 06-Aug-18 1:10 AM, robin lamacraft wrote:

Thanks Michael,

Let me make a simple suggestion for the solution of the audit trail commit logging. My previous proposal was to embed that now obsolete data records as additional records in the same table as it was edited. Then, later for those obsolete records to be purged from time to time by a separate management process.  They would be identified as groups indexed by the Commit ID.

At no real cost we could use a similar method by creating a second database OR more tables in the current HRE schema with tables that effectively copied the record that was about to be made obsolete to a record in its "logging" version of that table. This would be just a record copy including the Commit ID. So that would be a new table with its own PID for records. It would be easy to construct and easy to use to back out some changes in their reverse commit order. Effectively a minor adjustment to SQL of the table that it was its logging partner table.

Using this process, then all the HRE table record PIDs can now be used as the primary key.

That should improve the database access efficiency. Hence this would mean that apart from the record copying when a record was updated, the management of the Logging trail can be parceled into another plugin. By copying record for record there would be no need to convert field values to text strings, etc (some conversions can loose some precision for some numeric values if you then do a redo.

I would like to hear your opinion about this. It roughly doubles the number of tables in the schema, but the extra tables are likely to have few records.

Robin

On 05-Aug-18 11:45 PM, Michael Erichsen wrote:

Fine.

Yes, there were days, when 16 K of memory was delivered on a truck :-) I'll change everything I meet to Integer.

I haven't looked at the commit logs yet, but I am sure that we can find a solution.

Mvh Michael Erichsen

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/History-Research-Environment/HRE--History-Research-Environment/issues/69#issuecomment-410523067, or mute the thread https://github.com/notifications/unsubscribe-auth/AVeLtHLNLh0a9Qu6gIdAK97ChqDt441Fks5uNv4RgaJpZM4VvWRS.

-- Robin Lamacraft, Adelaide, Australia

-- Robin Lamacraft, Adelaide, Australia

MichaelErichsen commented 6 years ago

Hi Robin

I cannot see the full consequences of this, so I tend to trust you in this.

Mvh Michael Erichsen

-------- Original Message -------- Subject: Re: [History-Research-Environment/HRE--History-Research-Environment] Database Primary Keys (#69) From: RobinLamacraft notifications@github.com Date: Mon, August 06, 2018 10:11 am To: History-Research-Environment/HRE--History-Research-Environment HRE--History-Research-Environment@noreply.github.com Cc: Michael Erichsen michael@myerichsen.net, Author author@noreply.github.com

Hi Michael,

There implications of this use of the PID fields and the revision of logging changes by a commit-based trail that need to be adjusted in documentation already released to GitHub.

Can we get an agreement on the plan of action so that the documentation can be brought inline with these decisions?

Robin

On 06-Aug-18 1:10 AM, robin lamacraft wrote:

Thanks Michael,

Let me make a simple suggestion for the solution of the audit trail commit logging. My previous proposal was to embed that now obsolete data records as additional records in the same table as it was edited. Then, later for those obsolete records to be purged from time to time by a separate management process.  They would be identified as groups indexed by the Commit ID.

At no real cost we could use a similar method by creating a second database OR more tables in the current HRE schema with tables that effectively copied the record that was about to be made obsolete to a record in its "logging" version of that table. This would be just a record copy including the Commit ID. So that would be a new table with its own PID for records. It would be easy to construct and easy to use to back out some changes in their reverse commit order. Effectively a minor adjustment to SQL of the table that it was its logging partner table.

Using this process, then all the HRE table record PIDs can now be used as the primary key.

That should improve the database access efficiency. Hence this would mean that apart from the record copying when a record was updated, the management of the Logging trail can be parceled into another plugin. By copying record for record there would be no need to convert field values to text strings, etc (some conversions can loose some precision for some numeric values if you then do a redo.

I would like to hear your opinion about this. It roughly doubles the number of tables in the schema, but the extra tables are likely to have few records.

Robin

On 05-Aug-18 11:45 PM, Michael Erichsen wrote:

Fine.

Yes, there were days, when 16 K of memory was delivered on a truck :-) I'll change everything I meet to Integer.

I haven't looked at the commit logs yet, but I am sure that we can find a solution.

Mvh Michael Erichsen

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/History-Research-Environment/HRE--History-Research-Environment/issues/69#issuecomment-410523067, or mute the thread https://github.com/notifications/unsubscribe-auth/AVeLtHLNLh0a9Qu6gIdAK97ChqDt441Fks5uNv4RgaJpZM4VvWRS.

-- Robin Lamacraft, Adelaide, Australia

-- Robin Lamacraft, Adelaide, Australia

-- You are receiving this because you authored the thread. Reply to this email directly or view it on GitHub: https://github.com/History-Research-Environment/HRE--History-Research-Environment/issues/69#issuecomment-410624820

RobinLamacraft commented 6 years ago

Thanks Michael,

Probably Rod or Don can quickly find what needs to be adjusted and then I will need to follow it through to check how this all fits together.

This change should make the retrieval of data more efficient because the primary key index can be used to short cut some cases where the previous approach implied more complex SQL queries.

I will get to that as soon as the Name Styles Overview document can be uploaded to GitHub.

Robin

On 06-Aug-18 7:41 PM, Michael Erichsen wrote:

Hi Robin

I cannot see the full consequences of this, so I tend to trust you in this.

Mvh Michael Erichsen

-------- Original Message -------- Subject: Re: [History-Research-Environment/HRE--History-Research-Environment] Database Primary Keys (#69) From: RobinLamacraft notifications@github.com Date: Mon, August 06, 2018 10:11 am To: History-Research-Environment/HRE--History-Research-Environment HRE--History-Research-Environment@noreply.github.com Cc: Michael Erichsen michael@myerichsen.net, Author author@noreply.github.com

Hi Michael,

There implications of this use of the PID fields and the revision of logging changes by a commit-based trail that need to be adjusted in documentation already released to GitHub.

Can we get an agreement on the plan of action so that the documentation can be brought inline with these decisions?

Robin

On 06-Aug-18 1:10 AM, robin lamacraft wrote:

Thanks Michael,

Let me make a simple suggestion for the solution of the audit trail commit logging. My previous proposal was to embed that now obsolete data records as additional records in the same table as it was edited. Then, later for those obsolete records to be purged from time to time by a separate management process.  They would be identified as groups indexed by the Commit ID.

At no real cost we could use a similar method by creating a second database OR more tables in the current HRE schema with tables that effectively copied the record that was about to be made obsolete to a record in its "logging" version of that table. This would be just a record copy including the Commit ID. So that would be a new table with its own PID for records. It would be easy to construct and easy to use to back out some changes in their reverse commit order. Effectively a minor adjustment to SQL of the table that it was its logging partner table.

Using this process, then all the HRE table record PIDs can now be used as the primary key.

That should improve the database access efficiency. Hence this would mean that apart from the record copying when a record was updated, the management of the Logging trail can be parceled into another plugin. By copying record for record there would be no need to convert field values to text strings, etc (some conversions can loose some precision for some numeric values if you then do a redo.

I would like to hear your opinion about this. It roughly doubles the number of tables in the schema, but the extra tables are likely to have few records.

Robin

On 05-Aug-18 11:45 PM, Michael Erichsen wrote:

Fine.

Yes, there were days, when 16 K of memory was delivered on a truck :-) I'll change everything I meet to Integer.

I haven't looked at the commit logs yet, but I am sure that we can find a solution.

Mvh Michael Erichsen

— You are receiving this because you commented. Reply to this email directly, view it on GitHub

https://github.com/History-Research-Environment/HRE--History-Research-Environment/issues/69#issuecomment-410523067,

or mute the thread

https://github.com/notifications/unsubscribe-auth/AVeLtHLNLh0a9Qu6gIdAK97ChqDt441Fks5uNv4RgaJpZM4VvWRS.

-- Robin Lamacraft, Adelaide, Australia

-- Robin Lamacraft, Adelaide, Australia

-- You are receiving this because you authored the thread. Reply to this email directly or view it on GitHub:

https://github.com/History-Research-Environment/HRE--History-Research-Environment/issues/69#issuecomment-410624820

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/History-Research-Environment/HRE--History-Research-Environment/issues/69#issuecomment-410658712, or mute the thread https://github.com/notifications/unsubscribe-auth/AVeLtF7Ccj5tuGek5WksQU8YmILJPbMfks5uOBZkgaJpZM4VvWRS.

-- Robin Lamacraft, Adelaide, Australia

HREferg commented 6 years ago

I guess Rod and I are somewhat befuddled as to what changes are needed.

As I understand it, there is a change needed for logging purposes, and a change needed for SMALLINT - > INTEGER for primary key consistency.

As neither have had the requirements clearly explained, we’re both at a loss of what to do next….

Don