Closed letercarr closed 4 years ago
Thanks for the report @letercarr !
It may be that some additional configuration is required in your GCP project.
Looking at the the Pipelines v2 documentation for the parameter:
If set to true, do not attach a public IP address to the VM. Note that without a public IP address, additional configuration is required to allow the VM to access Google services.
See https://cloud.google.com/vpc/docs/configure-private-google-access for more information.
Have you used private addresses for other VMs?
To get more detail about where the dsub
job is failing, check the dstat --full
output. The first thing to look at would be the events
; there may be something revealed there. Next would be to grab the internal-id
and check the output of:
gcloud alpha genomics operations describe <id>
There may be additional event details there.
Hi @letercarr,
We have tested this out by following the GCE docs:
https://cloud.google.com/vpc/docs/configure-private-google-access
and our dsub jobs ran successfully, though with one caveat:
When your VM has no public IP address it can't access Dockerhub, so any Docker images used for those tasks need to be in Google Container Registry. If you are using images from Dockerhub, you should push a copy to your Cloud project's container registry and then update your --image
to use the new gcr://
path.
For example:
$ docker pull python:2.7-slim
2.7-slim: Pulling from library/python
...
35944cd3271f: Pull complete
Digest: sha256:a17cb64cdd52190f9fe6c13680ccb7801b2abcb7a2cefbc936004550590e992f
Status: Downloaded newer image for python:2.7-slim
$ docker tag python:2.7-slim gcr.io/YOUR-PROJECT/python:2.7-slim
$ docker push gcr.io/YOUR-PROJECT/python:2.7-slim
The push refers to repository [gcr.io/YOUR-PROJECT/python]
...
5dacd731af1b: Layer already exists
2.7-slim: digest: sha256:a17cb64cdd52190f9fe6c13680ccb7801b2abcb7a2cefbc936004550590e992f size: 1163
Then use --image gcr.io/YOUR-PROJECT/python:2.7-slim
in your dsub
command-line.
I can also confirm that running dsub with private buckets hangs if your perms aren't configured correctly, and runs once they are. (Just worked through this over the weekend.)
Specifically (since I just ran into this again and hadn't documented it well), my issue was as follows:
I am using dsub with a Docker image stored on gcr.io. I am using private IPs only.
If my project isn't configured so that my "VPC Networks" have "Private Google Access", then that container will never fetch. In that case, the GCP instance will sit there idly forever, giving no warning that the Docker image could not be fetched.
The documentation now includes a section on configuring dsub
VMs to have no public IP address:
https://github.com/DataBiosphere/dsub/blob/master/docs/compute_resources.md#public-ip-addresses
Including a section:
It is highly recommended that you test your job carefully, checking
dstat ... --full
events and your--logging
files to ensure that your job makes progress and runs to completion. A misconfigured job can hang indefinitely or until the infrastructure terminates the task. The Google providers default--timeout
is 7 days.
When I add --use-private-address to a simple job (which usually takes a few minutes) the job seems to submit normally but hang.
Using google-v2 as a provided and dsub version: 0.2.1