DataBiosphere / dsub

Open-source command-line tool to run batch computing tasks and workflows on backend services such as Google Cloud.
Apache License 2.0
265 stars 44 forks source link

dsub with a shared VPC #137

Closed slagelwa closed 5 years ago

slagelwa commented 5 years ago

Now that we can use dsub with a VPC (related issue #111), can we use dsub with a shared VPC?

mbookman commented 5 years ago

Hi Joe,

Can you explain more about the use-case? What is it that you are trying to accomplish. Is there something that you have tested that fails?

Thanks.

slagelwa commented 5 years ago

Sure can. We’ve recently moved to a infrastructure where we have a hosting GCloud project that has a VPC with a VPN tunnel to our organization. We then share this VPC out to other projects, this way we don’t have to setup a VPN tunnel with every new project and it can be centrally managed. While it’s true that our dsub VMs almost never would need to access back to our organizations network so you would think we could just use a project only VPC for dsub. However due to reasons beyond our control (security requirements/central management/etc) we’re asked to use a specific shared subnet in a shared VPC for these types of jobs.

From what little I understand, shared network/subnets are listed and specified differently than how you would for a norma project’s VPCs, as you have to specify the host project e.g. to create a instance:

gcloud compute instances create [INSTANCE_NAME] \ --project [SERVICE_PROJECT_ID] \ --subnet projects/[HOST_PROJECT_ID]/regions/[REGION]/subnetworks/[SUBNET] \ --zone [ZONE]

mbookman commented 5 years ago

Thanks for the details.

Have you manually created an instance in the target project?

If that is successful, have you tried passing the full projects/[HOST_PROJECT_ID]/regions/[REGION]/subnetworks/[SUBNET] to dsub with the --subnetwork flag? If so, what happened?

slagelwa commented 5 years ago

Working on it. I tried the long subnetwork flag and it didn’t work but I want to confirm I can create a instance via the command line and long flag first and then I’ll try again.

slagelwa commented 5 years ago

Ok, so I confirmed that the full projects/[HOST_PROJECT_ID]/regions/[REGION]/subnetworks/[SUBNET] works for Google Genomics, but doesn't seem to work with dsub version: 0.2.3. You get back a job id suggesting the job started but then no VMs start up and dstat doesn't show anything is running. The command I ran was:

dsub \
    --project XXXX  \
    --zone "us-east1-b" \
    --provider google-v2 \
    --logging gs://XXXX/logs \
    --command 'echo hello' \
    --network projects/XXXX/regions/us-east1/YYYY \
    --subnetwork projects/XXXX/regions/us-east1/subnetworks/mgmt01 \
mbookman commented 5 years ago

Thanks Joe.

Can you get the operation id with:

dstat ... --full | grep internal-id

and then pass the internal-id value to:

gcloud alpha genomics operations describe <id>

and I'd be curious if then events section of

dstat ... --full

reflects what the underlying operation indicates is occurring.

slagelwa commented 5 years ago

Sure thing. But when I run dstat I get:

2018-12-05 17:44:07.054968: Exception HttpError: <HttpError 503 when requesting https://genomics.googleapis.com/v2alpha1/projects/XXXXX/operations?filter=%28labels.%22user-id%22+%3D+%22joe%22%29+AND+%28labels.%22job-id%22+%3D+%22echo--joe--181205-174312-38%22%29&alt=json&pageSize=128 returned "The service is currently unavailable.">
Retrying...

However listing the operations revealed the problem (which was on my part):

gcloud alpha genomics operations list

"Execution failed: creating instance: inserting instance: Invalid value\
    \ for field 'resource.networkInterfaces[0].network': 'projects/XXXXX/regions/us-east1/YYYYY'.\
    \ The URL is malformed."

Fixing the network/subnetwork as follows I'm happy to report works:

    --network projects/XXXXX/global/networks/YYYYY  \
    --subnetwork projects/XXXXX/regions/us-east1/subnetworks/ZZZZZZ

However the dstat command is still reporting the service as unavailable.

mbookman commented 5 years ago

Hi Joe,

Does the dstat error persist? Are you able to run dstat for other tasks? Other projects? I'll file a bug with the Pipelines API if operations.list is returning a 503 for a specific request. dstat is presently working for me.

Thanks,

-Matt

slagelwa commented 5 years ago

The dstat error does persist in my project with the shared VPC. I just tried it in a scratch project, with a fresh VM and fresh installation of dsub against default network/subnetwork and everything worked as expected.

mbookman commented 5 years ago

Thanks for discovering, Joe, that dsub works with a shared VPC.

Going to close this. The remaining issue that you are running into is a Pipelines API issue that the Cloud Health team is working to fix.