Open bpbond opened 1 year ago
I have played with RSQLite and it seems straightforward. Much better than writing our own code!
How do flags get removed from the database? Does L2 remove them? Do they ever get removed?
Basically: is the flags database persistent between driver()
invocations?
L1.qmd
will create/overwrite the database (using the OOB column); L2_algorithmic.qmd
and Shiny app Human_QAQC
(neither of which exists yet) will add to it; and L2 will read it.As the name suggests, data quality flags are used to let the user know the quality of a particular data variable or factors that impact on the quality of a variable. In this standard we use an integer value in the range 0 to n:
0 is reserved for future use and is not used
1 is always good data.
The values of n, what they represent and how data with that flag value should be interpreted is incorporated into files by means of the a variable that is structured as follows.
A file containing just one data quality flag will contain the variable qc_flag
Where a file contains more that on data quality flag variable the data quality flag named is structured as: qc_flag_<name>
Ameriflux https://ameriflux.lbl.gov/data/flux-data-products/data-qaqc/physical-range-module/
Pastorello, G., et al. (2020), The FLUXNET2015 dataset and the ONEFlux processing pipeline for eddy covariance data, Scientific Data, 7(1), 225, DOI:10.1038/s41597-020-0534-3
Chu, H., Christianson, D. S., Cheah, Y.-W., Pastorello, G., O’Brien, F., Geden, J., Ngo, S.-T., Hollowgrass, R., Leibowitz, K., Beekwilder, N. F., Sandesh, M., Dengel, S., Chan, S. W., Santos, A., Delwiche, K., Yi, K., Buechner, C., Baldocchi, D., Papale, D., Keenan, T. F., Biraud, S. C., Agarwal, D. A., and Torn, M. S.: AmeriFlux BASE data pipeline to support network growth and data sharing, Sci Data, 10, 614, 2023.
UGA LTER https://gce-lter.marsci.uga.edu/gce_toolbox/wiki/QAQC.htm
QA/QC flag codes should be documented in the metadata (i.e. 'Data' category, 'Codes' field) using the following format: "Q = questionable value, I = invalid value, M = missing", etc. This ensures that the flag codes are properly displayed in standard and XML metadata, and also allows column values codes to be automatically generated when flags are optionally converted to encoded integer columns during ASCII or MATLAB export operations or manually in the structure editor. A GUI flag definition editor is provided with the GCE Data Toolbox, which can be opened using the 'View/Edit Q/C Flag Definitions' option on the 'Edit > Q/C Flag Functions' menu.
Suggested flag codes are listed below:
I = invalid value (out of range) -- use for out-of-range/impossible values (e.g. negative mass)
Q = questionable value -- use for values outside of expected range (e.g. below detection limit,
well outside of historical value range, pattern indicating data contamination)
E = estimated value -- use for values that were estimated by interpolation or other means
S = spike/noise -- use for sharp discontinuities/spikes indicating data contamination
We want a flag database
General
- L1 -> Flags applied to data -> Filtering -> Computation -> L2
- L2 filters out data and assigns a quality column
Add a flag API is something like:
Timestamp
? Not necessary given ID but useful for quick separation by e.g. year/monthObservation ID
NecessarySeverity
Drop/Warning/NoteFlag type
Out_of_bounds/Timestep_outlier/Expert_opinion/...Author
Author (human or algorithmic)Remark
RemarkOther things we will want
- Get flags for a year/month
- Get flags for a timestamp/ID
- Get flags for an ID
- Clear flag(s)
Implementation
- ~Data frames saved to disk (RData? something quick), probably in year/month folders for efficiency~
- Genuine database: less code, higher performance! https://cran.r-project.org/web/packages/RSQLite/vignettes/RSQLite.html
Flags come from
- OOB column -> converted
- Outlier or other statistical analysis
- Relationship or other analysis
- Human QAQC (e.g., Shiny app)
Flags get written out as CSVs with
- L1_flag intermediate data product?
- L2 data?
> What about "hierarchical flags"? Discuss this with @roylrich @selinalcheng @stephpenn1 YES, but I am not sure how?
We want a flag database
General
Add a flag API is something like:
Timestamp
? Not necessary given ID but useful for quick separation by e.g. year/monthObservation ID
NecessarySeverity
Drop/Warning/NoteFlag type
Out_of_bounds/Timestep_outlier/Expert_opinion/...Author
Author (human or algorithmic)Remark
RemarkOther things we will want
Implementation
Flags come from
Flags get written out as CSVs with
What about "hierarchical flags"? Discuss this with @roylrich @selinalcheng @stephpenn1