ethankale / Sedna

Water database. Evolution of King County's GData.
0 stars 0 forks source link

Add Telemetry #116

Open ethankale opened 3 years ago

ethankale commented 3 years ago

Add support for telemetry feeds. This is a complicated, multi-part enhancement with major changes to the database schema.

Overall Vision Data enters the database via a TelemetryMeasurement table. Anything can do this - probably an SSIS job for now, but could be via a REST API or something; doesn't matter. Metadata records are created for every unique combination of sample point/parameter/method/unit via a repeating job (every 5 minutes or so). Data are then moved into the regular Measurement table once the metadata records are created and marked provisional.

This approach will create a proliferation of metadata records. On the positive side, it requires minimal changes to the existing database schema, and most of the existing logic will continue to work (like deleting existing metadata records when replacing telemetered data). Rough math - if there are 20 sites with an average of 6 sensors each, and telemetry is hourly, there will be 2,880 new metadata records per day, or 86,400 per month.

Steps

  1. Create a new TelemetryMeasurement table. Columns are date/time, UTCOffset, UTC date/time, value, samplepoint id, parameter id, method id, unit id, depth, comment.
  2. Add a new TelemetryFeed field in the Metadata table (boolean, default value NO).
  3. Create a stored procedure that creates new Metadata records for every unique copy of samplepoint/parameter/method/unit in TelemetryMeasurements, then copies the data into the Measurement table with the correct metadataid and the Provisional field marked "yes", and empties the TelemetryFeed table.
ethankale commented 3 years ago

Alternative approach - use an upsert to create the metadata record. That would dramatically reduce the number of metadata records, and wouldn't require any input on the part of the user. It would slightly complicate the stored procedure, because there would have to be a final pass to update the begin & end dates (in addition to complicating it by adding the upsert logic).

One possible issue with this approach - skipped data. Say there's a streak of telemetry data, then a set of normally uploaded data, then more telemetry data. If the user goes to upload new data that overwrites just the most recent telemetry data, current logic would erase the old data as well, because it erases based on matching metadata, rather than underlying data.

Should we add support for the data frequency? That could resolve the above issue by ensuring that a new metadata record is created if there is no record one data step previous to the one to be inserted. It would be some complicated SQL, but should be possible.