climsoft / climsoft-web

Climsoft web application
MIT License
0 stars 6 forks source link

Quality Control Implementation #36

Open Patowhiz opened 3 months ago

Patowhiz commented 3 months ago

Overview

After reviewing the WMO CDMS specifications, I suggest developing the following quality control (QC) submodules to enhance our climate data management system:

  1. Duplicate Data Check: To eliminate duplicate entries during data ingestion, preventing unnecessary redundancy.

  2. Limits Check: During data ingestion, values outside the acceptable range will be flagged for review.

  3. Source Check: To differentiate and validate identical data from various sources, designating the most reliable source as final.

  4. Missing Data Check: To detect data gaps, facilitating informed decisions on handling these absences for subsequent analysis.

  5. Internal Consistency Check: To verify the coherence of related data points within the dataset, such as temperature and dew point correlations. This check will include Same value, Jump value and Interelement checks.

  6. Temporal Consistency Check: To identify abrupt temporal changes, distinguishing between potential errors and actual environmental shifts.

  7. Spatial Consistency Check: To assess data across various locations, identifying spatial anomalies that may indicate localized discrepancies.

  8. Extreme Value Check: To scrutinize and authenticate any extreme values or statistical outliers beyond the normal range.

  9. Data Homogeneity Check: To correct biases from changes in observational methods or locations, especially vital for long-term climate studies.

  10. Metadata Check: To investigate metadata for additional insights that may elucidate detected anomalies or inconsistencies.

I recommend constructing a QC workflow that processes these checks in a logical and efficient sequence, starting with simpler tasks and advancing to more complex analyses. While some steps may occur concurrently, the overall process should be iterative, ensuring a comprehensive and nuanced data quality assessment.

Furthermore, each QC step will be systematically logged in the observation model, which is specifically designed to accommodate these checks, enhancing transparency and traceability in data quality control.

Some of these checks could be user driven (manual) or system driven (automated) or semi automated depending on the nature of the quality control check.

Request for Comments

I invite feedback on this proposal. Your insights and suggestions will be invaluable.