PecanProject / pecan

The Predictive Ecosystem Analyzer (PEcAn) is an integrated ecological bioinformatics toolbox.
www.pecanproject.org
Other
199 stars 230 forks source link

Implementing multi-run workflows #869

Open ghost opened 8 years ago

ghost commented 8 years ago

For the work I'm doing, we need to automate doing similar PEcAn runs at many different sites. As I understand, there's a general interest in building this capability into PEcAn—essentially, having a new master workflow that triggers multiple related workflows in an intelligent way. @mdietze and I have discussed this offline a bit, and I wanted to elicit additional feedback here. I left out a lot of details and this still grew into a pretty big post, so apologies in advance for that...

Settings

I'm planning that we'd still have a single settings XML file for a multi-run workflow. First we'll need to separate out some non-run-specific settings from the run block, per GH-212. Then we can encasuplate multiple run blocks under a new runs (or runlist?) tag.

Changes to the workflow

There are some parts of the existing workflow.R that only need to be run once, even when initiating a multi-run workflow, including:

I think the rest (CONFIG, MODEL, OUTPUT, ENSEMBLE, SENSITIVITY, PDA) is conceptually run-specific.

So, the idea is that the master workflow would get set up, read in the settings file, and do the meta-analysis in about the same way it currently does. Then it would prepare settings objects for each run, and loop over these to perform the run-specific steps.

I'm not currently thinking about any grand meta-results-collection/analysis scheme. I just want the workflow to trigger all the runs. But obviously there are cool options for analyzing/displaying results from e.g. multiple sites run in a single workflow.

Directory structure

I'm assuming the master settings object will specify a main output directory, and then individual workflows will send all results to subdirectories. So I would propose replacing the "run" and "out" directories specified under host with a single "workflow.directory" entry. Then we can add a run.name tag under run to be used for naming a run-specific subdirictory within the workflow dir. "run" and "out" directories can go in there.

An alternative is to have master run/out directories (still specified under host), and put run-specific subdirs in each. This strikes me as somewhat less future-proof, but it would preserve the ability to keep "out" and "run" on separate drives (which I assume is why there are two directories under host rather than just one?).

Turn workflow.R into workflow(...)

I was thinking this and other tasks would be made easier by converting the current workflow script into a function. Initially it could simply take a settings object or path to an xml file as an argument, basically like the script does now. But we could also do things like add boolean arguments to turn on/off modules (overriding what's in the settings object—I thought this might be handy for testing). In offline discussions Mike and I had some of the same ideas about ultimately wanting the workflow to be very modular. I thought functionalizing it was a good first step (and had some specific details in mind that would be useful to me for the current work), though he wasn't so sure. Perhaps this belongs in a separate issue, but thought I'd mention it here.

Job management

Finally, since the meta-workflow is going to call potentially many individual workflows and each of those could require potentially many model runs, some thought needs to go into how to manage the jobs. Obviously we don't want to just run the individual workflows sequentially. On the other hand, running them completely in parallel is probably a bad idea too—even if all the model runs are getting farmed out to a cluster where they're handled by a queue, you'd still have a process for each workflow running on the main machine setting up jobs, waiting for them, etc.

Again, maybe a separate issue (apparently related to some tricks of @robkooper's for batching SA/EA runs on geo?), but getting the ball rolling here...

dlebauer commented 8 years ago

Brainstorming a few of the use cases:

rykelly commented 7 years ago

I have a rough version of multi-site PEcAn working here. A couple of notes are below. Comments welcome but this is really just a brain dump / progress report. Feel free to wait until I get things cleaned up a bit.

github-actions[bot] commented 4 years ago

This issue is stale because it has been open 365 days with no activity.