broadinstitute / cromwell

Scientific workflow engine designed for simplicity & scalability. Trivially transition between one off use cases to massive scale production environments
http://cromwell.readthedocs.io/
BSD 3-Clause "New" or "Revised" License
990 stars 358 forks source link

Cromwell GCP error - The referenced network resource cannot be found #6477

Open hsimran13 opened 3 years ago

hsimran13 commented 3 years ago

Hello,

I am new to cromwell and trying to run a test workflow on GPC. I am using the PAPIv2 backend and here is my config:

$ cat genomics.conf | grep -v '#' | sed '/^$/d'
include required(classpath("application"))
google {
    application-name = "cromwell"
    auths = [
        {
            name = "application-default"
            scheme = "application_default"
        }
    ]
}
engine {
    filesystems {
        gcs {
            auth = "application-default"
            project = "xxxxx"
        }
    }
}
backend {
    default = PAPIv2
    providers {
        PAPIv2 {
            actor-factory = "cromwell.backend.google.pipelines.v2alpha1.PipelinesApiLifecycleActorFactory"
            config {
                project = "xxxxx"
                root = "gs://xxxx/cromwell_execution"
                virtual-private-cloud {
                    network-label-key = "xxx"
                    subnetwork-label-key = "xxx"
                    auth = "application-default"
                }
                name-for-call-caching-purposes: PAPI
                slow-job-warning-time: "24 hours"
                genomics-api-queries-per-100-seconds = 1000
                maximum-polling-interval = 600
                request-workers = 3
                genomics {
                    auth = "application-default"
                    endpoint-url = "https://genomics.googleapis.com/"
                    location = "us-west1"
                    restrict-metadata-access = false
                    localization-attempts = 3
                    parallel-composite-upload-threshold="150M"
                }
                filesystems {
                    gcs {
                        auth = "application-default"
                        project = "xxxx"
                        caching {
                            duplication-strategy = "copy"
                        }
                    }
                    http { }
                }
                default-runtime-attributes {
                    cpu: 1
                    failOnStderr: false
                    continueOnReturnCode: 0
                    memory: "2048 MB"
                    bootDiskSizeGb: 10
                    disks: "local-disk 10 SSD"
                    noAddress: false
                    preemptible: 0
                    zones: ["us-west1-a", "us-west1-b"]
                }
                include "papi_v2_reference_image_manifest.conf"
            }
        }
    }
}

When I run with the above config using:

java -Dconfig.file=genomics.conf -jar cromwell-66.jar run cumulus.wdl -i cumulus_inputs.json

I am getting the following error message:

[2021-08-24 22:05:33,60] [info] WorkflowManagerActor: Workflow 6cc303b4-295d-49fa-a996-b5cf7ec9beea failed (during ExecutingWorkflowState): java.lang.Exception: Task cumulus.cluster:NA:1 failed. The job was stopped before the command finished. PAPI error code 3. Execution failed: allocating: creating instance: inserting instance: Invalid value for field 'resource.networkInterfaces[0].network': ''. The referenced network resource cannot be found.

I have tried passing the vpc and subnet id using the following config:

              virtual-private-cloud {
                    network-label-key = "xxx"
                    subnetwork-label-key = "xxx"
                    auth = "application-default"
                }

The above values are my actual vpc and subnet id/name. However, it is still giving me that error message. Is there something I am missing from a configuration perspective. Any help would be greatly appreciated. Our VPC network's are not created in auto mode and that is not something we have control over unfortunately.

Thanks, -Simran

hsimran13 commented 3 years ago

I have created a new lablels file and using that to pass the VPC/subnet info but still get the same error:

$ grep -i label genomics.conf
                    network-label-key = "my-private-network"
                    subnetwork-label-key = "my-private-subnetwork"
$ cat labels.json
{
  "my-private-network":  "xxxx",
  "my-private-subnetwork": "xxxx"
}

and updated my cromwell command to the following:

java -Dconfig.file=genomics.conf -jar cromwell-66.jar run cumulus.wdl -i cumulus_inputs.json -l labels.json

I still get the same error though. Is this even possible or am I missing something?

Thanks.

mcovarr commented 3 years ago

In Cromwell versions 67 and earlier virtual-private-cloud configuration exclusively specifies Google project label keys, not literal values. The actual values are specified in labels on the Google project. For example with a VPC config like:

              virtual-private-cloud {
                    network-label-key = "my-network-label-key"
                    subnetwork-label-key = "my-subnetwork-label-key"
                    auth = "application-default"
                }

As seen in the labels page in GCP console, there should be project labels with key/values of my-network-label-key/my-private-network and my-subnetwork-label-key/my-private-subnetwork.

hsimran13 commented 3 years ago

Thanks @mcovarr for your response. I realized that after my initial post and created a labels.json with the following contents:

{
  "google_labels": {
    "my-private-network":  "xxx",
    "my-private-subnetwork": "yyy"
  }
}

where xxx and yyy are my actual vpc network and subnet names in GCP. Then I added the "-l labels.json" option to the cromwell run command but that still gives me the same error. Am I missing something here? Apologies but this is what I am understanding from the posts/docs that needs to happen but won't work when I try it. Am I supposed to create some label in the actual GCP account as well?

Thanks.

hsimran13 commented 3 years ago

Ahh I think I see what you mean. I don't need the "-l labels.json" but need to create an actual Label in the GCP account that has the following key/value:

my-private-network: xxx my-private-subnetwork: yyy

I don't have access to create the labels but will have someone do this and try again. Let me know if I am still missing something.

Thanks.

hsimran13 commented 3 years ago

@mcovarr which resource does the label need to be created in? Your link took me to the IAM & Admin Label's section for GCP. Is that where I should create this label or on the GCP instance resource that I am running this command from?

Thanks.

mcovarr commented 3 years ago

The labels should be created on the GCP project (not on the GCE instance), so the link should be going to the correct location.

hsimran13 commented 3 years ago

Thanks @mcovarr, that seems to have worked. The job has gone into a Running state. Really appreciate your quick response and assistance.

-Simran

hsimran13 commented 3 years ago

@mcovarr, looks like I got past the initial issue but now getting the following error:

[2021-08-25 01:11:31,83] [info] WorkflowManagerActor: Workflow 2a7b8039-a555-4f58-86b0-dc4a6fa21dff failed (during ExecutingWorkflowState): java.lang.Exception: Task cumulus.cluster:NA:1 failed. The job was stopped before the command finished. PAPI error code 9. generic::failed_precondition: Constraint constraints/compute.trustedImageProjects violated for project gred-cumulus-sb-01-991a49c4. Use of images from project cloud-lifesciences is prohibited.

Looks like our GCP accounts don't allow non standard images. Which image is this workflow trying to use? Is there a way to provide our own image to this pipeline instead?

Thanks

mcovarr commented 3 years ago

I am not familiar with that error message. From a bit of Googling it looks like this may be relevant. Assuming cloud-lifesciences is Google's project hosting the image that Cloud Life Sciences is trying to use to spin up the worker VM, you may need to add projects/cloud-lifesciences to your organization's trusted image projects.