Write L2 to file and leave all additional variable derivation for the L2toL3 step

BaptisteVandecrux commented 3 weeks ago

The idea of this new version is that:

1) L2 data files are written into level_2/raw and level_2/tx folders by get_l2 (just like it was done for the L3 data previously). One consequence is that this low-latency level 2 tx data can be posted very fast on THREDDS for showcase and fieldwork, and processed into BUFR files.

2) L2 tx and raw files are merged using join_l2 (just like it was done for the L3 data previously). Resampling to hourly, daily, monthly values are done here, but could be left for a later stage.

3) get_l3 is now a script that loads the merged L2 file, run process.L2toL3.toL3. This will allow more variables to be derived in toL3 and historical data to be appended once the L3 data is processed.

PennyHow commented 3 weeks ago

Thanks for making a start on this! The basic structuring is there, however...

L2 tx and raw files are merged using join_l2 (just like it was done for the L3 data previously). Resampling to hourly, daily, monthly values are done here, but could be left for a later stage.

I had envisioned this differently. We usually merge datasets as the end of pypromice.process.aws.getL1():

https://github.com/GEUS-Glaciology-and-Climate/pypromice/blob/3e4e3177bb7e22cd8b1534b3bf3d2d2fe5ca3795/src/pypromice/process/aws.py#L89-L94

I would like this to be moved to the end of toL2, with an optional flag to write out the unmerged datasets (so we can see differences in files, e.g. PROMICE v2 data vs. v3 data), as well as the merged dataset (i.e.to post to THREDDS). I also need this merging within the aws object, rather than a CLI script, as the type of merging needs to be defined from the toml files - as we discussed, there are instances where a combine_first protocol is fine, but there will also be times where a hard merge needs to be adopted, such as during location switches.

Can you let me build upon what you have done and then open a PR into your branch @BaptisteVandecrux?

BaptisteVandecrux commented 3 weeks ago

Thanks Penny! No worries to be pushing into this branch!

I think we need to keep redundant stations separate until the level 2 because the filtering of data happens in L1toL2 so we don't know which variables we are missing until that point. It is true that, as a start, we can have a sharp switch from a station v2 to a station v3,. But we should leave room for future improvement where for example, non-height-dependent variables (such as pressure and thermistor strings from the old station are being used to gap-fill missing values in the new station.

I had envisioned this differently. We usually merge datasets as the end of toL1: This is the merging of several aws-l0/raw files (or aws-l0/tx files) into a single aws-l1/raw dataset (or aws-l1/tx dataset). join_l2 was meant to join the aws-l2/raw dataset with the aws-l2/tx dataset, just like join_l3 use to do.

A possibility you could consider is to have site-specific config files, where you describe the switch dates and the type of gap-filling.

Anyway, as long as I can continue to work on the L2toL3 (for the operations relative to the dataset) and get_l3 for the operations that require external files (like merging with historical data) then I'm fine with it!

PennyHow commented 3 weeks ago

I think we need to keep redundant stations separate until the level 2 because the filtering of data happens in L1toL2 so we don't know which variables we are missing until that point. It is true that, as a start, we can have a sharp switch from a station v2 to a station v3,. But we should leave room for future improvement where for example, non-height-dependent variables (such as pressure and thermistor strings from the old station are being used to gap-fill missing values in the new station.

Exactly 👍🏼

I can work alongside you, no problem. I have time today and the rest of this week.

PennyHow commented 3 weeks ago

Sorry this was me, I have reverted back the merged PR on the main branch. Please feel free to re-open this PR.

GEUS-Glaciology-and-Climate / pypromice

Write L2 to file and leave all additional variable derivation for the L2toL3 step #251