USACE / instrumentation

Instrumentation project issue tracking and project planning
MIT License
5 stars 1 forks source link

timeseries_measurement Index and Primary key improvement #99

Closed MikeNeilson closed 3 years ago

MikeNeilson commented 3 years ago

I was just looking at the timeseries_measurement table and you can drastically simplify the primary key and unique index.

The surrogate get on the the table is basically pointless and just servers to use additional space. I'd recommend the following structure with a composite primary key.

CREATE table timeseries_measurement(
  timeseries_id uuid references timeseries(id),
  time timestamp with time zone,
  value real
  PRIMARY KEY( timeseries_id,time)
);

You very much want the key in that order, you'll almost invariably be looking up time series data by ID followed by a time range. That puts the appropriate indexes on the table so that that works as fast as possible.

MikeNeilson commented 3 years ago

@brettpalmberg

I'm willing to put the work in to make the change, but I'm a little lost into how your system structures upgrades to the DDL. So far as I can tell (and it's possible I'm just missing something) it just rebuilds the database and there's nothing structure for in-places upgrades.

brettpalmberg commented 3 years ago

Agreed. This is a good change @MikeNeilson - it would change the table definition from (currently):

CREATE TABLE IF NOT EXISTS public.timeseries_measurement (
    id UUID PRIMARY KEY NOT NULL DEFAULT uuid_generate_v4(),
    time TIMESTAMPTZ NOT NULL,
    value REAL NOT NULL,
    timeseries_id UUID NOT NULL REFERENCES timeseries (id) ON DELETE CASCADE,
    CONSTRAINT timeseries_unique_time UNIQUE(timeseries_id,time)
);

To what is described in the original post:

CREATE table timeseries_measurement(
  timeseries_id uuid references timeseries(id),
  time timestamp with time zone,
  value real
  PRIMARY KEY( timeseries_id,time)
);

I'm willing to put the work in to make the change, but I'm a little lost into how your system structures upgrades to the DDL. So far as I can tell (and it's possible I'm just missing something) it just rebuilds the database and there's nothing structure for in-places upgrades.

Would gladly welcome the PR. To date, we have been writing small migration files to address changes in databases. Files prefixed with migration_ here are not intended to be run as a set, they try to address the incremental change we needed for a PR/feature.

MikeNeilson commented 3 years ago

well, that folder is a little obvious in hindsight. But alas such is the challenge with any legacy code, getting your head around it before moving forward.

Anyways, thanks. I'm still stuck working on other things at the moment but I'll submit something later on in the week.

brettpalmberg commented 3 years ago

No worries! That sounds great - thanks @MikeNeilson ! You may have seen this already, but there are integration/regression tests for the API endpoints written using postman in this folder.

If you're using the Postman client, you can import the tests and environment file to run the tests interactively.

This script in the repository can also be used to run the tests headless using Postman's command-line counterpart newman

KevinJJackson commented 3 years ago

@brettpalmberg - Just an fyi, I believe the change made here, having the primary key as a composite key of id and time, causes an issue where we can no longer update the measurement time.

From the UI perspective, trying to update the time causes a new row to be added with the new time. the old time persists as it's own row, most likely because you can't update part of a primary key.

I've informed @dliang864 of this issue as well.

MikeNeilson commented 3 years ago

It shouldn't. However I wasn't able to fully test the change with the API before the concept got merged it.

That all said, all of the appropriate constraints are in place (at least in the code I can see in the repo). Are you sure your environment is setup in the correct timezone? e.g. the "new row" isn't offset by something. Insert a new value in the exact same time SHOULD throw a unique constraint error.

KevinJJackson commented 3 years ago

I'm not sure timezone really matters, it's more of a design principle thing right? There is a check in the API code whether that's Go/SQL, I'm not sure, that says "update or create" based on the measurement data passed to the function. If there is a different time value to the one that currently exists (because I am trying to update it) there will be no matching composite key, so it would decide it needs to create a new key to match that new time, this creates the new row leaving the old one untouched.

MikeNeilson commented 3 years ago

The Time zone can, it may not, we need to see the output of select showing the duplicate row.

But even with or without the surrogate key you shouldn't be able to add two rows for the time (timeseries_id,time) as that is the key that defines what the table will hold.

The more likely situtation, if the timezone settings aren't causing an offset, is that the time itself isn't getting specified sufficient and they are, to the computer two different times. The resolution of postgres timestamp with timezone being 1 microsecond it's possible the first entry didn't correctly zero things out in the passed in timestamp.

I think we'll need @brettpalmberg or someone to do a select on the timeseries_id (making sure that psql instances it using UTC) within the datarange you've used to to properly compare the values.

brettpalmberg commented 3 years ago

Thanks guys - @KevinJJackson - let's connect on this later today with a screen share.