broadinstitute / cromwell

Scientific workflow engine designed for simplicity & scalability. Trivially transition between one off use cases to massive scale production environments
http://cromwell.readthedocs.io/
BSD 3-Clause "New" or "Revised" License
997 stars 361 forks source link

Cromwell should have the ability to launch nodes into GCP Subnets #4070

Open davidbernick opened 6 years ago

davidbernick commented 6 years ago

Edit (by @cjllanwarne) in light of #4806:

Following #4806 we will be able to read Google project metadata to specify a VPC network and subnet.

Therefore what will remain for this ticket is making the same functionality available on a per-workflow basis... eg an ability to supply the same network/subnetwork information via workflow-options?


Original issue text:

https://cloud.google.com/vpc/docs/vpc -- for a primer on GCP Subnets.

Users should be able to tell Cromwell to launch nodes into a subnet.

For environments like Firecloud, we should have some mechanism (like maybe SAM) to make sure the user actually has the right to use a particular subnet.

The main reason to do this is https://cloud.google.com/vpc/docs/using-flow-logs -- we want to be able to monitor traffic in and out of the network for more significant audited environments. So the driver is ultimately "compliance". But it's probably a good idea anyhow.

After this is done, please work with FC team to make sure they can take advantage of this. I'm not sure who to tag to make sure this cross-team work is done.

cjllanwarne commented 6 years ago

A few immediate thoughts:

ruchim commented 6 years ago

@dvoet would you be the right person to coordinate with when this functionality is ready to be handed off to someone in FireCloud?

geoffjentry commented 6 years ago

Note for implementer that #4005 was closed as a dupe of this, but double check that ticket before starting

davidbernick commented 6 years ago

It looks like there’s a subnetwork field in the PAPI request: https://cloud.google.com/genomics/reference/rest/Shared.Types/Metadata?hl=RU#network (thanks @geoffjentry )

davidbernick commented 6 years ago

What can we do to make sure this gets done? We need this for FC users (specifically) right now -- basically want to be able to capture VPC traffic. So FC and Compliance is the driver. Since this is cross-team, who owns the work? Who can get it on the schedule? FC PAPI nodes should be launched in a Subnet, is the main issue.

davidbernick commented 6 years ago

https://broadinstitute.atlassian.net/plugins/servlet/mobile#issue/GAWB-4001 for Workbench tracking.

ruchim commented 5 years ago

The goal is to start this ticket mid-Feb

ruchim commented 5 years ago

The goal is to start this ticket mid-Feb

danbills commented 5 years ago

Emerald is hitting this as part of the Joint Genotyping effort which scatters 100K wide. Not high urgency as the next batch is not expected until next quarter or even 2.

cjllanwarne commented 5 years ago

One mechanism to support this (using labels on Google projects) will be provided by #4806, per Terra requirements. Other mechanisms (workflow options, ???) might follow.