GEUS-Glaciology-and-Climate / pypromice

Process AWS data from L0 (raw logger) through Lx (end user)
https://pypromice.readthedocs.io
GNU General Public License v2.0
12 stars 4 forks source link

Merging of the station records at each site including historical stations #246

Open BaptisteVandecrux opened 1 month ago

BaptisteVandecrux commented 1 month ago

In a level_4 folder, having one merged record for each site, combining historical, v2 and v3 stations as well as moved stations (e.g. THU_U replaced by THU_U2). Ongoing implementation in https://github.com/GEUS-Glaciology-and-Climate/pypromice/blob/join_l4/src/pypromice/process/join_l4.py with some updates in other files (https://github.com/GEUS-Glaciology-and-Climate/pypromice/compare/main...join_l4).

It uses is a list of the latest stations (as keys) and old stations in reverse chronological order: https://github.com/GEUS-Glaciology-and-Climate/pypromice/blob/97eaedb6a1d89f6ab62ce20a30287c4ae7eb1393/src/pypromice/process/join_l4.py#L12-L35 At the moment join_l4 is called on the same list of stations as join_l3, meaning sites for which new transmission, new raw files or new flags have recently been added: https://github.com/GEUS-Glaciology-and-Climate/aws-operational-processing/blob/b0d52ecf9427b204460f21f110ef0e049d0c49c4/l3_processor.sh#L173-L185

If a station is listed in old_name .values() (names in brackets in old_name) then it is not processed by join_l4 (because appended to another AWS data). If a station is not in old_name.keys() then there's no historical data that needs to be appended and it is copied, as-is to the level_4 folder.

For the historical GC-Net stations, the aliases for variables are defined in an external file src/pypromice/process/variable_aliases_GC-Net.csv also defined as package data.

The merging is done by time slices: https://github.com/GEUS-Glaciology-and-Climate/pypromice/blob/97eaedb6a1d89f6ab62ce20a30287c4ae7eb1393/src/pypromice/process/join_l4.py#L229-L232 where ds1 is the current AWS data and ds2 is the historical AWS data being appended before the start of ds1. Gap-filling during the overlapping period is currently not implemented.

The result are files of identical format and same variables as the level_3 files.

Instead of stid there is now a site_id and list_station_id attributes defined as: https://github.com/GEUS-Glaciology-and-Climate/pypromice/blob/97eaedb6a1d89f6ab62ce20a30287c4ae7eb1393/src/pypromice/process/join_l4.py#L271-L278 meaning that we drop the the v3 and the 2 in CEN2 (and potentially other stations)

Right now, because of the parallel call to join_l4, join_l4 cannot know that it needs to re-append a given site (e.g. CEN) if the older station data (e.g. CEN1) is updated but not the latest station (e.g. CEN2).