Closed mankoff closed 10 months ago
Here is some example of flag-fix DB (check out ReadMe): https://github.com/GEUS-PROMICE/AWS-data/tree/main/flag-fix Any suggestion on how they should be called, on their structure or how they should be used?
Function using the data base for the "add" function. I am planning to add cases with "multiply", "rotate", "smooth", "custom_function_1" as functions.
def adjust_data(df, site):
df_out = df.copy()
if not os.path.isfile('metadata/flag-fix/'+site+'.csv'):
print('No erroneous data listed for '+site)
return df_out
adj_info = pd.read_csv('metadata/flag-fix/'+site+'.csv')
adj_info=adj_info.sort_values(by=['variable','t0'])
adj_info.set_index(['variable','t0'],drop=False,inplace=True)
for var in np.unique(adj_info.variable):
if var not in df.columns:
print(var+' not in datafile')
continue
else:
print('Adjusting '+var)
for t0, t1, func, val in zip(adj_info.loc[var].t0,
adj_info.loc[var].t1,
adj_info.loc[var].adjust_function,
adj_info.loc[var].adjust_value):
print(t0,func,val)
if np.isnan(t1):
t1 = df_out.time[-1].isoformat()
if func == 'add':
df_out.loc[t0:t1,var] = df_out.loc[t0:t1,var].values + val
fig = plt.figure()
df[var].plot(label='before adjustment')
df_out[var].plot(label='after adjustment')
plt.xlabel('Time')
plt.ylabel(var)
plt.legend()
plt.tight_layout()
fig.savefig('figures/'+site+'_adj_'+var+'.jpeg')
return df_out
In KAN_L.csv
t0,t1,variable,adjust_function,adjust_value,comment,URL_graphic
2016-07-27T00:00:00+00:00,,DepthPressureTransducer_Cor(m),add,-6.297000000000001,manually adjusted by bav,https://github.com/GEUS-PROMICE/AWS-data/blob/main/flags/graphics/KPC_L_dpt_1.png
2016-07-29T00:00:00+00:00,,DepthPressureTransducer_Cor(m),add,-0.1,manually adjusted by bav,https://github.com/GEUS-PROMICE/AWS-data/blob/main/flags/graphics/KPC_L_dpt_1.png
2019-07-11T00:00:00+00:00,,DepthPressureTransducer_Cor(m),add,-4.478,manually adjusted by bav,https://github.com/GEUS-PROMICE/AWS-data/blob/main/flags/graphics/KPC_L_dpt_1.png
Result:
I would like to get some feedback about the data adjustment files:
Current state of https://github.com/GEUS-PROMICE/PROMICE-AWS-toolbox :
[to do: add more adjustment functions (rotation, smoothing... etc)]
Illustration:
This function reads the station-specific adjustment files metadata/flag-fix/\
These error files have the following structure:
t0 | t1 | variable | adjust_function | adjust_value | comment | URL_graphic |
---|---|---|---|---|---|---|
2017-05-23 10:00:00 | 2017-06-10 11:00:00 | DepthPressureTransducer_Cor | add | -2 | manually adjusted by bav | https://raw.githubusercontent.com/GEUS-PROMICE/PROMICE-AWS-toolbox/master/figures/UPE_L_adj_DepthPressureTransducer_Cor(m).jpeg |
... | ... | ... | ... | ... | ... | ... |
with
field | meaning |
---|---|
t0 | ISO date of the begining of flagged period |
t1 | ISO date of the end of flagged period |
variable | name of the variable to be flagged. [to do: '*' for all variables] |
adjust_function | function that needs to be applied over the given period: - add - filter_min - filter_max - rotate - smooth |
adjust_value | input value to the adjustment function |
comment | Description of the issue |
URL_graphic | URL to illustration or Github issue thread |
The file is comma-separated:
t0,t1,variable,adjust_function,adjust_value,comment,URL_graphic
2015-03-01T00:00:00+00:00,,DepthPressureTransducer_Cor(m),add,2.3,manually adjusted by bav,https://github.com/GEUS-PROMICE/PROMICE-AWS-toolbox/blob/master/Report_toc.md#s15-2-1
...
The function adjust_data then applies the given function to the given variable in the dataframe. The adjusted variable is named \
This all looks good and I suggest you keep using it for now if it works for you. Given that this is early in the development phase, I'm assuming this will all be re-written at some point later.
I will need to use this fixing flagged data function at some point early in the processing pipeline. Many of these fixes (station rotation, temperature, etc.) are used in some of the first equations that derive dsr, dlr, usr, ulr, etc. so the fix needs to be implemented at the beginning of the L0 to L1 processing step.
I can't comment in detail on the format and script implementation until I've spent some time using it.
First implementation submitted as PR.
Remaining points that will need to be addressed:
Do we provide a quality field for the variable being adjusted that would indicate whether a value has been adjusted or not?
We need a plotting routine to be run after the processing (so it doesn't slow down NRT data upload), that describes how the data has been manipulated.
Flags and adjustment CSVs are working quite well. I'm closing this one now.
The flagged data can be plotted using scripts like https://github.com/GEUS-Glaciology-and-Climate/PROMICE-AWS-diagnostic or visualized on https://github.com/GEUS-Glaciology-and-Climate/PROMICE-AWS-diagnostic/blob/main/plot_compilations/flags_toc.md
https://github.com/GEUS-Glaciology-and-Climate/PROMICE-AWS-diagnostic could be run automatically if needed.
Right now suspicious data is just removed from the level 3 data but the level 0 remains intact and the flagging procedure is reproducible.
This is closely related to flagging data #18
In addition to flagging, we want to FIX data.
One example is when we see that the station was pointing in the wrong direction
This could be flagged (no flag for this is currently defined, but perhaps we add a ROTATION flag).
A simple fix is to rotate it so the mean wind direction matches the historical mean. Or rotate it based on the reported direction minus 180 °. Or rotate it based on the measured direction at the next visit, etc.
It is probably difficult to meta-program fixes via database entries (e.g. ROTATION flag, FIX value) because the mathematical operation to apply fixes are diverse. Some are additive (add the rotation offset), others may be multiplicative, or benefit from linear interpolation, or complex functions of other variables to estimate a bad value, etc.
The solution here will need to include a database and code. Perhaps the database is similar to (or part of) the flags DB: What sensor(s), time(s) and station(s) have which problem(s). Perhaps a new field is a function name (possibly with one or more values defined in the fix DB). Those functions, which must be implemented in the code, are then called with the correct arguments.