DataBiosphere / dsub

Open-source command-line tool to run batch computing tasks and workflows on backend services such as Google Cloud.
Apache License 2.0
262 stars 45 forks source link

gcsfuse docker missing, —mount option now broken #298

Open ConorMesser opened 2 days ago

ConorMesser commented 2 days ago

I believe the docker image for gcsfuse previously referenced is now missing/deleted. It is still referenced by the pipeline for mounting a bucket using google_v2_base: gcr.io/cloud-genomics-pipelines/gcsfuse:latest.

When running dsub on a script that worked last week, I now get this error (for dsub v0.4.13 or v0.5.0): Execution failed: generic::unknown: pulling image: docker pull: running ["docker" "pull" "gcr.io/cloud-genomics-pipelines/gcsfuse:latest"]: exit status 1 (standard error: "Error response from daemon: Head \"https://gcr.io/v2/cloud-genomics-pipelines/gcsfuse/manifests/latest\": denied: Permission \"artifactregistry.repositories.downloadArtifacts\" denied on resource \"projects/cloud-genomics-pipelines/locations/us/repositories/gcr.io\" (or it may not exist)\n")

Can the referenced image be updated to either an existing public image or gcsfuse or can a new docker be created and hosted using the docker file?

Code snippet to reproduce the error:

Script:

vi ./scripts/test.hello_world.sh %%%% echo "Hello World" echo "${INPUT_TEXT}" %%%%%

DSUB Call:

DSUB_USER_NAME="$(echo "${OWNER_EMAIL}" | cut -d@ -f1)" BASH_SCRIPT="./scripts/test.hello_world.sh"

dsub \ --provider google-cls-v2 \ --user-project "${GOOGLE_PROJECT}" \ --project "${GOOGLE_PROJECT}" \ --network "network" \ --subnetwork "subnetwork" \ --service-account "$(gcloud config get-value account)" \ --user "${DSUB_USER_NAME}" \ --regions us-central1 \ --logging "${WORKSPACE_BUCKET}/test.hello_world.log" \ --machine-type "n2-standard-2" \ --disk-size "10" \ --name "test_mount" \ --script "${BASH_SCRIPT}" \ --image "conormesser/splash:v2.6.2" \ --mount DRIVE="${WORKSPACE_BUCKET}" \ --env INPUT_TEXT="This is a test"

mbookman commented 1 day ago

Thank-you for reporting @ConorMesser !

dsub is picking up a Docker image that had been maintained by the Google HCLS team (who also supported the Life Sciences API). The Dockerfile for this can be found here:

https://github.com/googlegenomics/pipelines-tools

I believe that the Google Batch team had inherited support for these tools. I've filed a ticket for this with that team. In the meantime, you could build and copy an image into Artifact Registry and update your local version of the code to pick it up:

https://cloud.google.com/artifact-registry/docs/docker/pushing-and-pulling

ConorMesser commented 1 day ago

I don’t have time to try this today but will attempt it next week.

If you hear anything from the Google Batch team, let me know!