glamod / glamod-ingest

Database preparation and ingestion for GLAMOD
BSD 2-Clause "Simplified" License
2 stars 1 forks source link

Update restructure land script to include source_id #20

Closed agstephens closed 3 years ago

agstephens commented 4 years ago

Here is the plan:

agstephens commented 3 years ago

NOTE: you can find multiple records in the station config file, for a single primary_id and record_number:

(Pdb) sc[(sc.primary_id == primary_id) & (sc.record_number == int(record_number))]
         primary_id  primary_id_scheme  record_number    ...    height_of_station_above_sea_level_accuracy  sea_level_datum source_id
10      AFM00040948                 13              1    ...                                           NaN              NaN       166
123644  AFM00040948                 13              1    ...                                           NaN              NaN       245
agstephens commented 3 years ago

The correct approach is explained in detail in: https://github.com/glamod/glamod-ingest/issues/21

It uses the temporary station config files, based on frequency:

source_id = get_source_id_from_temporary_station_configs(primary_id, record_number, frequency)

Where are the temporary station configs:

We have a couple of small CSV files that contain station configuration settings where duplicate primary ids existed:

These separate configuration tables are ONLY FOR USE when processing for this fix. They will not be loaded into the CDM.

They have been produced to overcome the issue we are having with duplicate primary_id fields across timescales. The observing_frequency field defines the timescales in the station_configuration table so it's not an issue to use the full station_configuration table already on the GWS for this release, we just need the separated tables to resolve the current data policy issues.

agstephens commented 3 years ago

Implemented in:

Commit:d4fb18b3720e69337b55e5e8d8bf86cd3ac9ca4f

agstephens commented 3 years ago

Added extra rule to only add source_id if not already present - making this futureproof.

Fixed in: a17a0caf3283836a88a9d579707c2d56c070d414

agstephens commented 3 years ago

Done.