Open MarkEdmondson1234 opened 2 years ago
The function cr_build_targets()
helps set up some boilerplate code to download targets meta data from the specified GCS bucket, run the pipeline and uplaod the artifacts back to the same bucket. Need some tests to see if it is respecting the right targets skips etc.
cr_build_targets(path=tempfile())
# adding custom environment args and secrets to the build
cr_build_targets(
task_image = "gcr.io/my-project/my-targets-pipeline",
options = list(env = c("ENV1=1234",
"ENV_USER=Dave")),
availableSecrets = cr_build_yaml_secrets("MY_PW","my-pw"),
task_args = list(secretEnv = "MY_PW"))
Resulting in build:
==cloudRunnerYaml==
steps:
- name: gcr.io/google.com/cloudsdktool/cloud-sdk:alpine
entrypoint: bash
args:
- -c
- gsutil -m cp -r ${_TARGET_BUCKET}/* /workspace/_targets || exit 0
id: get previous _targets metadata
- name: ubuntu
args:
- bash
- -c
- ls -lR
id: debug file list
- name: gcr.io/my-project/my-targets-pipeline
args:
- Rscript
- -e
- targets::tar_make()
id: target pipeline
secretEnv:
- MY_PW
timeout: 3600s
options:
env:
- ENV1=1234
- ENV_USER=Dave
substitutions:
_TARGET_BUCKET: gs://mark-edmondson-public-files/googleCloudRunner/_targets
availableSecrets:
secretManager:
- versionName: projects/mark-edmondson-gde/secrets/my-pw/versions/latest
env: MY_PW
artifacts:
objects:
location: gs://mark-edmondson-public-files/googleCloudRunner/_targets/meta
paths:
- /workspace/_targets/meta/**
Tests are working now which confirm a targets build can reuse previous builds artifacts, and also rerun if the source are updates https://github.com/MarkEdmondson1234/googleCloudRunner/pull/159/files
Need two modes(?) - one where all target files are the upcoming gcs integration which will download artifacts as needed, one where the data is loaded from other sources (file etc) kept in a normal GCS bucket
Added cr_buildstep_targets()
to prep for sending up individual build steps. cr_buildstep_targets_setup()
downloads the meta folder, cr_buildstep_targets_teardown()
uploads the targets changed files to the bucket.
Getting some feedback here https://github.com/ropensci/targets/issues/720
GCP already available via:
But I think there is an opportunity to move this more into a serverless direction, as the cloud build steps seem to seamlessly map to
tar_targets()
if a way of communicating between the steps can be done.As an example an equivalent
googleCloudRunner
totargets
minimal example would be:Normally I would put all the r steps in one buildstep sourced from a file but have added
readRDS() %>% blah() %>% saveRDS()
to illustrate functionality that I thinktargets
could take care of.Makes this yaml object that I think maps to
targets
closely:(more build args here)
Do the build on GCP via
the_build |> cr_build()
And/or each buildstep could be its own dedicated
cr_build()
and the build's artefacts are uploaded/downloaded after its run.This holds several advantages:
I see that as a tool that is better than Airflow for visualising DAGs, taking care of state management on whether each node needs to be run but with a lot of scale to build each step in a cloud environment.