AgPipeline / issues-and-projects

Repository for issues and projects
BSD 3-Clause "New" or "Revised" License
3 stars 1 forks source link

Makeflow pipeline steps should not call out to external databases #29

Closed julianpistorius closed 4 years ago

julianpistorius commented 4 years ago

The current behavior or issue

Currently the agpipeline/cleanmetadata and agpipeline/canopycover steps call out to an external database (BETYdb at Illinois).

This will cause reliability, scalability and reproducibility problems.

Expected behavior

Every run of a pipeline (same code and input) should be deterministic and idempotent.

There can be a 'stateful' wrapper around a deterministic core which talks to external systems.

Completion criteria

(DRAFT SOLUTION)

See https://github.com/terraref/workflow-pilot for inspiration.

julianpistorius commented 4 years ago

From a thread in ACIC Slack, by @Chris-Schnaufer:

On the BETYdb front, I have 4 approaches that could be taken (listed below). I believe the first approach is the best, but I've never tested it with a running container (yet) but it appears others have,

Option 1: Call terrautils.betydb.dump_experiments() to write "bety_experiments.json" (specifying terra ref BETYdb instance with BETYDB_URL, BETYDB_KEY) setting local environment variable "BETYDB_LOCAL_CACHE_FOLDER" to a folder Set environment variable "BETYDB_LOCAL_CACHE_FOLDER" in Docker containers to the location of cache file "bety_experiments.json"

Option 2: Call terrautils.betydb. dump_experiments() to write "bety_experiments.json" Have a web server serve up the contents when experiment requests are made:

Option 3: Have a proxy that caches the requests and calls terra ref BETYdb when a query result isn't available

Option 4: Stand up BETYdb instance, populate it, clone it & run as many local instances as you want

Option 1 & 2 prefetch the data. Options 2 & 3 require standing up one or more web servers. Option 4 could be the slowest solution of all to implement.

dlebauer commented 4 years ago

has this been implemented yet?

Chris-Schnaufer commented 4 years ago

@dlebauer This issue is way out of date in its description. This can be closed as Done. The code to do these external calls are apps and not part of the workflow: https://github.com/AgPipeline/drone-makeflow/blob/7e744944c0be223b7610296ee37d0477065f48eb/scif_app_recipes/ndcctools_v7.1.2_ubuntu16.04.scif#L92