Closed julianpistorius closed 4 years ago
From a thread in ACIC Slack, by @Chris-Schnaufer:
On the BETYdb front, I have 4 approaches that could be taken (listed below). I believe the first approach is the best, but I've never tested it with a running container (yet) but it appears others have,
Option 1: Call terrautils.betydb.dump_experiments() to write "bety_experiments.json" (specifying terra ref BETYdb instance with BETYDB_URL, BETYDB_KEY) setting local environment variable "BETYDB_LOCAL_CACHE_FOLDER" to a folder Set environment variable "BETYDB_LOCAL_CACHE_FOLDER" in Docker containers to the location of cache file "bety_experiments.json"
Option 2: Call terrautils.betydb. dump_experiments() to write "bety_experiments.json" Have a web server serve up the contents when experiment requests are made:
Option 3: Have a proxy that caches the requests and calls terra ref BETYdb when a query result isn't available
Option 4: Stand up BETYdb instance, populate it, clone it & run as many local instances as you want
Option 1 & 2 prefetch the data. Options 2 & 3 require standing up one or more web servers. Option 4 could be the slowest solution of all to implement.
has this been implemented yet?
@dlebauer This issue is way out of date in its description. This can be closed as Done. The code to do these external calls are apps and not part of the workflow: https://github.com/AgPipeline/drone-makeflow/blob/7e744944c0be223b7610296ee37d0477065f48eb/scif_app_recipes/ndcctools_v7.1.2_ubuntu16.04.scif#L92
The current behavior or issue
Currently the
agpipeline/cleanmetadata
andagpipeline/canopycover
steps call out to an external database (BETYdb at Illinois).This will cause reliability, scalability and reproducibility problems.
Expected behavior
Every run of a pipeline (same code and input) should be deterministic and idempotent.
There can be a 'stateful' wrapper around a deterministic core which talks to external systems.
Completion criteria
(DRAFT SOLUTION)
See https://github.com/terraref/workflow-pilot for inspiration.