COMPASS-DOE / sensor-data-pipeline

Sensor data workflows and processing scripts
MIT License
4 stars 0 forks source link

Data quality flags - planning #55

Open bpbond opened 1 year ago

bpbond commented 1 year ago

We want a flag database

General

Add a flag API is something like:

Other things we will want

Implementation

Flags come from

Flags get written out as CSVs with

What about "hierarchical flags"? Discuss this with @roylrich @selinalcheng @stephpenn1

bpbond commented 1 year ago

I have played with RSQLite and it seems straightforward. Much better than writing our own code!

bpbond commented 1 year ago

How do flags get removed from the database? Does L2 remove them? Do they ever get removed?

bpbond commented 1 year ago

Basically: is the flags database persistent between driver() invocations?

bpbond commented 12 months ago

NCAS data https://sites.google.com/ncas.ac.uk/ncasobservations/home/data-project/ncas-data-standards/ncas-amof/data-quality-flags

As the name suggests, data quality flags are used to let the user know the quality of a particular data variable or factors that impact on the quality of a variable. In this standard we use an integer value in the range 0 to n: 
0 is reserved for future use and is not used
1 is always good data. 
The values of n, what they represent and how data with that flag value should be interpreted is incorporated into files by means of the a variable that is structured as follows.
A file containing just one data quality flag will contain the variable qc_flag 
Where a  file contains more that on data quality flag variable the data quality flag named is structured as:  qc_flag_<name> 
bpbond commented 12 months ago

Ameriflux https://ameriflux.lbl.gov/data/flux-data-products/data-qaqc/physical-range-module/

Pastorello, G., et al. (2020), The FLUXNET2015 dataset and the ONEFlux processing pipeline for eddy covariance data, Scientific Data, 7(1), 225, DOI:10.1038/s41597-020-0534-3

Chu, H., Christianson, D. S., Cheah, Y.-W., Pastorello, G., O’Brien, F., Geden, J., Ngo, S.-T., Hollowgrass, R., Leibowitz, K., Beekwilder, N. F., Sandesh, M., Dengel, S., Chan, S. W., Santos, A., Delwiche, K., Yi, K., Buechner, C., Baldocchi, D., Papale, D., Keenan, T. F., Biraud, S. C., Agarwal, D. A., and Torn, M. S.: AmeriFlux BASE data pipeline to support network growth and data sharing, Sci Data, 10, 614, 2023.

bpbond commented 12 months ago

UGA LTER https://gce-lter.marsci.uga.edu/gce_toolbox/wiki/QAQC.htm

QA/QC flag codes should be documented in the metadata (i.e. 'Data' category, 'Codes' field) using the following format: "Q = questionable value, I = invalid value, M = missing", etc. This ensures that the flag codes are properly displayed in standard and XML metadata, and also allows column values codes to be automatically generated when flags are optionally converted to encoded integer columns during ASCII or MATLAB export operations or manually in the structure editor. A GUI flag definition editor is provided with the GCE Data Toolbox, which can be opened using the 'View/Edit Q/C Flag Definitions' option on the 'Edit > Q/C Flag Functions' menu.

Suggested flag codes are listed below:

   I = invalid value (out of range) -- use for out-of-range/impossible values (e.g. negative mass)
   Q = questionable value -- use for values outside of expected range (e.g. below detection limit,
       well outside of historical value range, pattern indicating data contamination)
   E = estimated value -- use for values that were estimated by interpolation or other means
   S = spike/noise -- use for sharp discontinuities/spikes indicating data contamination
roylrich commented 12 months ago

We want a flag database

General

  • L1 -> Flags applied to data -> Filtering -> Computation -> L2
  • L2 filters out data and assigns a quality column

Add a flag API is something like:

  • Timestamp? Not necessary given ID but useful for quick separation by e.g. year/month
  • Observation ID Necessary
  • Severity Drop/Warning/Note
  • Flag type Out_of_bounds/Timestep_outlier/Expert_opinion/...
  • Author Author (human or algorithmic)
  • Remark Remark

Other things we will want

  • Get flags for a year/month
  • Get flags for a timestamp/ID
  • Get flags for an ID
  • Clear flag(s)

Implementation

Flags come from

  • OOB column -> converted
  • Outlier or other statistical analysis
  • Relationship or other analysis
  • Human QAQC (e.g., Shiny app)

Flags get written out as CSVs with

  • L1_flag intermediate data product?
  • L2 data?

> What about "hierarchical flags"? Discuss this with @roylrich @selinalcheng @stephpenn1 YES, but I am not sure how?