broadinstitute / cromwell

Scientific workflow engine designed for simplicity & scalability. Trivially transition between one off use cases to massive scale production environments
http://cromwell.readthedocs.io/
BSD 3-Clause "New" or "Revised" License
988 stars 357 forks source link

Running and publishing workflows in Cromwell on GCP with GCR images exposes developers to unexpected egress costs #6442

Open cwhelan opened 3 years ago

cwhelan commented 3 years ago

As discussed in https://github.com/broadinstitute/cromwell/issues/6235, developers of workflows for GCP who store their images in Google Container Repositories can be exposed to large Google GCS egress charges when users attempt to run workflows in different continental regions, resulting in many trans-continental container pulls. There currently does not seem to be a satisfactory way to guard against this:

Some possible ideas were suggested by @freeseek in https://github.com/broadinstitute/cromwell/issues/6235:

mcovarr commented 3 years ago
cwhelan commented 3 years ago

Thanks for pointing out the Cromwell/PAPI v2 beta support for docker image caches, I wasn't aware of this feature.

After reading the documentation at https://cromwell.readthedocs.io/en/develop/backends/Google/#docker-image-cache-support, I have a few additional questions about how this could help with the problem described in this issue.

It looks like docker image caching needs to be configured at the Cromwell server level with a manifest file indicating the location of images. Are the images listed in the manifest truly global (as indicated in their paths, ie projects/broad-dsde-cromwell-dev/global/images/..), so they won't incur egress to use them to create VMs in different regions? Can they be made public for use by users in other projects?

For an external user already running their own cromwell instance in a different region who wants to run our pipeline, we'd need to publish an image caching configuration stanza and manifest file along with our workflow, and the administrator of the cromwell server would need to modify their server configuration to point to our manifest file and restart the server -- is that correct? If their cromwell server was already configured with a different image cache manifest, is there a way to add a second manifest, or would they have to edit their own cache manifest to include the entries in ours?

If I'm right in my understanding (please correct me if I'm not) it seems like this solution could help, but is quite cumbersome and requires very savvy external users who are willing to take extra steps to help prevent saddling us with egress charges. It would be great if a cache manifest could be configured at the workflow level -- perhaps in the wdl or the workflow options.

freeseek commented 3 years ago

I feel like overall WDL developers need to have some way to control that docker images be cached within the WDL. Tasks that are not scattered are likely not relevant here, as one single download is unlikely to incur large egress charges. On the other hand for scattered tasks there should be a way for the WDL developer to demand caching, rather than relying on the user to do the right thing. Ideally this should all be handled by Google and container images, when downloaded, should be cached for a pre-determined amount of time. There is something called mirror.gcr.io but I did not fully understand how it works and whether it could be part of the solution here.

wnojopra commented 3 years ago

I've been working on this issue in parallel. My general recommendation would be to not use GCR for public images as there is a perpetual risk of egress charges to the image owner. GCR is great for keeping more of your workflow infrastructure inside Google Cloud, but egress charges will be billed to you as the owner of the storage bucket. Artifact registry seems to have its own set of egress charges, similar to GCS egress charges. Alternatively, services like quay.io offer unlimited storage and serving of public repos.

The consequence of not using GCR is that docker images are now hosted outside of Google Cloud, meaning workflows will need to make an external network call to download the image. The external call will require VMs to have an external IP address. Large parallel workflows will need quota for several external IP addresses, and may run into quota limits.

To alleviate this, workflow runners could be instructed to make a copy of the image into their own GCR. This also has the advantage that the workflow runner can use a repository in the same location as their VMs. Workflow publishers should include instructions on how to upload the image to a private GCR.

How does that sound as a set of guidelines for the community?

freeseek commented 3 years ago

It seems to me very cumbersome to ask users to think this through. Currently my WDLs have a variable:

  String docker_registry = "us.gcr.io"

But I have dockers uploaded on each of us.gcr.io, eu.gcr.io, and asia.gcr.io. The main issue is that the user has to take initiative to fill that variable with the correct GCR. It would be great if there was some sort of environmental variable that could let the WDL know which docker registry is the closest (and which one has no egress charges).

wnojopra commented 3 years ago

I've been looking into a solution that uses VPC SC settings to restrict bucket egress to specific locations. The gist of it is the owner of the docker image puts the container registry in a project that is within a VPC SC perimeter. The docker image owner will need to configure the perimeter such that only VMs from specific ipRanges can access the bucket/docker image. I've put the details and instructions in this doc that I've currently shared with the Broad.

freeseek commented 2 years ago

@mcovarr I also echo @cwhelan that the solution provided is quite cumbersome. What exactly would be the complexity in devising a solution where instead you could simply define a variable like backend.providers.#name.config.root (say for example backend.providers.#name.config.cache) that indicates where the docker images should be cached and maybe an option to specify whether downloading the image in the cache directory is something that needs to be done for all tasks or only for scattered tasks. I don't see why it should be left to the user to perform the caching manually. This would be more similar to what was hacked for the shared filesystem backend in #5063 and maybe a more general solution non-specific to the PAPIv2 backend would eliminate the need for such a hack. The problem still remains that developers would need to rely on users to configure Cromwell appropriately. If I could have it my way I would say that docker images should always be cached within scattered tasks (or at least this being a default behavior that can be modified)