DataBiosphere / dsub

Open-source command-line tool to run batch computing tasks and workflows on backend services such as Google Cloud.
Apache License 2.0
265 stars 44 forks source link

Using google cloud commands within Docker when testing on the `local` provider #272

Open carbocation opened 1 year ago

carbocation commented 1 year ago

I have a dsub-executed script that uses environment variables to determine which files to localize from storage using gsutil from a large list. (I do this instead of mounting because of the low reliability of gcsfuse.)

When running on the google backend provider, this approach works great even though I haven't explicitly stored any google cloud credentials within the Docker image.

But when running with the local provider, the lack of credentials is a problem when I try to run gsutil within the docker image:

ServiceException: 401 Anonymous caller does not have storage.objects.get access to the Google Cloud Storage object. Permission 'storage.objects.get' denied on resource (or it may not exist).
CommandException: 1 file/object could not be transferred.

Is there a pointer to how we might get gcloud-in-Docker to work for easier local troubleshooting? I tried passing --credentials-file but this didn't seem to have any effect on the local provider.

mbookman commented 1 year ago

Hi @carbocation !

A workaround for this is to make your ~/.config/gcloud directory available inside the container. This can be done with:

  --input-recursive CLOUDSDK_CONFIG_INPUT=${HOME}/.config/gcloud \
  --env CLOUDSDK_CONFIG=/mnt/data/input/file${HOME}/.config/gcloud \

Note that it should be sufficient to use:

  --input-recursive CLOUDSDK_CONFIG=${HOME}/.config/gcloud \

but the local provider sets environment variables too early. It should set the environment variables only for the "user command", but it ends up setting them before the localization command.