climsoft / climsoft-web

Climsoft web application
MIT License
0 stars 6 forks source link

Data Flow #27

Open Patowhiz opened 3 months ago

Patowhiz commented 3 months ago

Overview:

This proposal aims to standardise the data flow within Climsoft from initial entry to the generation of final products, emphasizing the need for consistent data source identification, unified storage, robust QC checks, and transparent logging for auditability.

Detailed Description:

Data Ingestion: Data ingestion is done through 3 source types that define the data ingestion methods.

Each entry method must clearly document the source of the data to ensure traceability. Each data source is associated with the source type.

Observations Table:

Quality Control (QC) Protocol:

Logging and Audit Trails:

Final Product Generation:

Proposal for Enhancements:

  1. Streamlined Data Entry:

    • Formalize data entry procedures that require source identification for every data input.
  2. Quality Control Reinforcement:

    • Implement a unified QC system that is both rigorous and standardized across all data types.
  3. Auditability and Transparency:

    • Develop an enhanced logging system for full transparency and accountability of data modifications and QC results.
  4. Finality in Product Creation:

    • Introduce criteria within Climsoft to determine and label data as 'final' for the production of climatological outputs.

Rationale:

The integrity of Climsoft's data and the trust in its climatological products hinge on a clear, accountable, and verifiable data management process. This proposal seeks to reinforce these aspects, ensuring Climsoft remains a reliable and authoritative tool for meteorological and hydrological data processing.

Request for Team Feedback:

I request feedback from the development community to refine this proposal. Contributions from the development team are essential to the successful enhancement of Climsoft's data workflow.

1

Patowhiz commented 1 month ago

After reflecting on this, I think we can assume that there is only 2 primary sources: Form and Import. Import will be data that comes from a file (through http, ftp etc) or an API.

This means there is no need to have the Machine/Digital data source. Note also, Automatic stations record data and save it to their data logger, it's from these data loggers that we can import the data. So there is no need of a machine to machine concept.