GEUS-Glaciology-and-Climate / pypromice

Process AWS data from L0 (raw logger) through Lx (end user)
https://pypromice.readthedocs.io
GNU General Public License v2.0
14 stars 4 forks source link

Only processing the latest transmission and appending to a static file ? #98

Open BaptisteVandecrux opened 1 year ago

BaptisteVandecrux commented 1 year ago

Originally posted by @patrickjwright in https://github.com/GEUS-Glaciology-and-Climate/pypromice/issues/92#issuecomment-1398231020

Another general thought. I really think the future of the processing will be to only process new transmitted observations on each run during operational processing (or, whatever recent time is needed for smoothing functions, perhaps just a few hrs or a few days). This should significantly reduce the processing time and required compute resources.

Then, if the processing code is changed significantly (or new content added to flag files for historical data!), we could manually run the processing with an optional arg to reprocess the entire historical dataset for every station.

Just something to keep in mind. Perhaps the flag file functions shouldn't be hard-wired to run with every processing run? Or, if we are only processing a few hours of recent data, then it will be OK since the specified flag periods can't be found and we will just move on?

I just wanted to mention this, to make these future potential changes easier to implement....

PennyHow commented 1 year ago

We already only re-process the L0 raw data if a change is detected in the raw/config toml file. It could be that we have a static L3 tx dataset that new L3 tx data lines are appended to.

Another option is to scrap the separate processing of tx and raw data, and merge them under one config toml file - then append new transmissions and only re-process the entire dataset when a flag file or the config toml file changes (as stated previously). We would just need to make sure that instantaneous measurements are retained in the final L3 dataset, but perhaps not distributed with the pubilcly available dataset.

BaptisteVandecrux commented 1 year ago

I think it is a long term goal but not a priority at all. I just wanted that Patrick's comment in the PR did not get lost after the merge.

We already only re-process the L0 raw data if a change is detected in the raw/config toml file. It could be that we have a static L3 tx dataset that new L3 tx data lines are appended to.

Since we are about to update a lot of flags and adjustments, and that we keep finding L0 files that were not being processed, I'd suggest that the entire processing chain is being run each time, at least for now. So we are sure that the latest raw files and a flag/adjustment files are being used each time.