Closed agstephens closed 3 years ago
NOTE: you can find multiple records in the station config file, for a single primary_id
and record_number
:
(Pdb) sc[(sc.primary_id == primary_id) & (sc.record_number == int(record_number))]
primary_id primary_id_scheme record_number ... height_of_station_above_sea_level_accuracy sea_level_datum source_id
10 AFM00040948 13 1 ... NaN NaN 166
123644 AFM00040948 13 1 ... NaN NaN 245
The correct approach is explained in detail in: https://github.com/glamod/glamod-ingest/issues/21
It uses the temporary station config files, based on frequency:
source_id = get_source_id_from_temporary_station_configs(primary_id, record_number, frequency)
Where are the temporary station configs:
We have a couple of small CSV files that contain station configuration settings where duplicate primary ids existed:
daily_monthly_station_config_file_30_07_20.psv.csv
sub_daily_station_config_file_30_07_20.psv.csv
These separate configuration tables are ONLY FOR USE when processing for this fix. They will not be loaded into the CDM.
They have been produced to overcome the issue we are having with duplicate primary_id
fields across timescales. The observing_frequency
field defines the timescales in the station_configuration
table so it's not an issue to use the full station_configuration
table already on the GWS for this release, we just need the separated tables to resolve the current data policy issues.
Implemented in:
Commit:d4fb18b3720e69337b55e5e8d8bf86cd3ac9ca4f
Added extra rule to only add source_id
if not already present - making this futureproof.
Fixed in: a17a0caf3283836a88a9d579707c2d56c070d414
Done.
Here is the plan:
cdmlite
, get:observation_id
(e.g.:AFI0000OAHR-6-1973-01-01-00:00-85-12
)<primary_id>-<record_number>-...
station_configuration
:primary_id
andrecord_number
instation_configuration
source_id
source_id
into cdmlite records.source_id
then FAIL