GEUS-Glaciology-and-Climate / pypromice

Process AWS data from L0 (raw logger) through Lx (end user)
https://pypromice.readthedocs.io
GNU General Public License v2.0
12 stars 4 forks source link

Flagging and adjusting data #92

Closed BaptisteVandecrux closed 1 year ago

BaptisteVandecrux commented 1 year ago

In this PR, I have implemented the flagging and fixing of data based on user defined csv files. The csv files are currently hosted on https://github.com/GEUS-Glaciology-and-Climate/PROMICE-AWS-data-issues and are downloaded locally where pypromice is being run.

It addresses issues #19 and #18 . Fixes #96

Flag files

They are located in PROMICE-AWS-data-issues/flags and are named after the station they should be applied on.

They are comma-separated and have the following format:

t0 t1 variable flag comment URL_graphic
2017-05-23T10:00:00+00:00 2017-06-10T11:00:00+00:00 rh_u NAN manually flagged by bav https://github.com/GEUS-Glaciology-and-Climate/PROMICE-AWS-data-issues/issues/24
... ... ... ... ... ...

with

field meaning
t0 ISO date of the beginning of flagged period (if omitted, taking first available timestamp)
t1 ISO date of the end of flagged period (if omitted, taking last available timestamp)
variable name of the variable to be flagged. [to do: '*' for all variables]
flag short flagging abbreviation:
- CHECKME
- UNKNOWN
- NAN
- OOL
- VISIT
comment Description of the issue
URL_graphic URL to illustration or Github issue thread

Adjustment files

They are located in PROMICE-AWS-data-issues/adjustments and are named after the station they should be applied on.

They are comma-separated and have the following format: t0 t1 variable adjust_function adjust_value comment URL_graphic
2017-05-23T10:00:00+00:00 2017-06-10T11:00:00+00:00 * time_shift -2 manually adjusted by bav https://github.com/GEUS-Glaciology-and-Climate/PROMICE-AWS-data-issues/issues/22
... ... ... ... ... ... ...

with

field meaning
t0 ISO date of the begining of flagged period
t1 ISO date of the end of flagged period
variable name of the variable to be flagged. [to do: '*' for all variables]
adjust_function function that needs to be applied over the given period:
- add
- min_filter
- max_filter
- rotate
- smooth
- multiply
adjust_value input value to the adjustment function
comment Description of the issue
URL_graphic URL to illustration or Github issue thread
patrickjwright commented 1 year ago

Another general thought. I really think the future of the processing will be to only process new transmitted observations on each run during operational processing (or, whatever recent time is needed for smoothing functions, perhaps just a few hrs or a few days). This should significantly reduce the processing time and required compute resources.

Then, if the processing code is changed significantly (or new content added to flag files for historical data!), we could manually run the processing with an optional arg to reprocess the entire historical dataset for every station.

Just something to keep in mind. Perhaps the flag file functions shouldn't be hard-wired to run with every processing run? Or, if we are only processing a few hours of recent data, then it will be OK since the specified flag periods can't be found and we will just move on?

I just wanted to mention this, to make these future potential changes easier to implement....

PennyHow commented 1 year ago

I've just pushed the changes to aws whereby initialisation of the object does not automatically start the processing. Now, processing can either commence with the aws.process() method, or each level processing can be executed step-by-step with aws.getL1(), aws.getL2() and aws.getL3().

I've also made some changes to the L0toL1 flagging and adjustment steps, mainly how the csv flag and adjustment files are fetched. This is now a separate function. I tried to follow the advise of @patrickjwright and added a specific exception error.

https://github.com/GEUS-Glaciology-and-Climate/pypromice/blob/3c7ca0d30068d5ea1b269397a2dfb325a694a72a/src/pypromice/L0toL1.py#L127-L166