arkhipov / temporal_tables

Temporal Tables PostgreSQL Extension
BSD 2-Clause "Simplified" License
935 stars 47 forks source link

Passing 'sys_period' in insert/update data #41

Open nyov opened 6 years ago

nyov commented 6 years ago

First, a big thank you for this extension! It has simplified the concept a lot over a custom PL/pgSQL setup. Enough so that even I can successfully use it :)

Now though I have an instance where I want to load data from various existing tables (essentially snapshots/partitions in time) with an existing timestamp, which I'd like to preserve in sys_period. I managed to do this for the initial (oldest) data-load by disabling the trigger around the COPY.

ALTER TABLE cache DISABLE TRIGGER versioning_trigger;
\copy
ALTER TABLE cache ENABLE TRIGGER versioning_trigger;

and casting the existing timestamp into the sys_period timerange:

QUERY='SELECT [...], tstzrange(my_timestamp::timestamptz, NULL) AS sys_period FROM oldcache'
PASTE='cache ([...], sys_period)'

${psql} -q -d ${SOURCEDB} -c "\timing off" -c "\copy (${QUERY}) TO STDOUT;" | \
    ${psql} -d ${TARGETDB} -c "ALTER TABLE cache DISABLE TRIGGER versioning_trigger;" -c "\copy ${PASTE} FROM STDIN;" -c "ALTER TABLE cache ENABLE TRIGGER versioning_trigger;"

This worked beautifully, but now that I need to do the actual history revision UPDATEs, I can no longer use dumb COPY and need the trigger (or a workaround).

Since these are 50+GB per existing table/dump, I dislike the idea of creating an sql dump where every INSERT must be prefixed by a set_system_time call, which I then first need to extract from the table's timestamp column:

SELECT set_system_time('1985-08-08 06:42:00+08'); INSERT INTO [...];
SELECT set_system_time('1985-08-08 06:43:00+08'); INSERT INTO [...];
SELECT set_system_time('1985-08-08 06:44:00+08'); INSERT INTO [...];

Is there any possibility to accept the lower bound from a sys_period tstzrange column if given in an INSERT/UPDATE/DELETE? E.g. if I passed sys_period as such

INSERT INTO x ([...], tstzrange(my_time::timestamptz, NULL::timestamptz) AS sys_period);
UPDATE x SET sys_perid = tstzrange(my_time::timestamptz, NULL::timestamptz) WHERE [...];

could the trigger be made to use this lower bound instead of ignoring the column and instead of the date given by set_system_time()?

This would make bulk-loading the data through a dumb COPY, like my pg dumps, much easier (with an upsert trigger), so long as I get the historical order right. In my case it would be very hard to build up a *_history table (as clarkdave explains) where I'd have to put on my forensics hat and manually stitch up the data to get the tstzrange's right.