PecanProject / pecan

The Predictive Ecosystem Analyzer (PEcAn) is an integrated ecological bioinformatics toolbox.
www.pecanproject.org
Other
199 stars 230 forks source link

message queue for meteorological data processing #3163

Open mdietze opened 1 year ago

mdietze commented 1 year ago

Description

Currently the processing of input data for the PEcAn workflow is done sequentially at run time. As an initial test case for being able to move to a more cloud-based workflow that is asynchronous, distributed, and event driven I propose that we start with met.process as an initial test case.

Proposed Solution

Put either just met.process, or all of do.conversions, within its own container with its own message queue. The message queue would need to pass in the relevant portion of the settings: which met data source, what site (name, lat, lon) or vector of sites, what data range, which model's file format is the target, etc.

Some issues to consider:

In general, met.process has the following steps

  1. download the raw met data
  2. convert this to netCDF CF format
  3. extract [regional] or gapfill [site-level] the data
  4. convert from netCDF CF to model specific format

Relevant bits of code to look at:

  1. base/workflow/R/do.conversions.R
  2. modules/data.atmosphere/R/met.process.R
  3. base/db/R/convert.input.R
  4. individual modules/data.atmosphere/R/download , met2CF, and extract* functions
  5. individual models/[model]/R/met2model.[model].R functions
ankurdesai commented 1 year ago

met.process is such a swiss army knife that I agree it is a useful standalone tool and cloud compatible (reprise of browndog functionality?). From that perspective, I would be in favor of separating out the database portions from the general steps (download, standardize, extract, gapfill, convert), with a wrapper that receives necessary updates to be made to bety to inputs and filepath records. May make it easier to debug too.

computate commented 1 year ago

@mdietze Would you be able to share the "steps to reproduce" a way to configure Pecan that will launch a Workflow that could be more distributed, and the "acceptance criteria" of a first deliverable for this issue?

github-actions[bot] commented 1 month ago

This issue is stale because it has been open 365 days with no activity.