broadinstitute / cromwell

Scientific workflow engine designed for simplicity & scalability. Trivially transition between one off use cases to massive scale production environments
http://cromwell.readthedocs.io/
BSD 3-Clause "New" or "Revised" License
989 stars 358 forks source link

AWS backend zone hardcode? #4974

Open vortexing opened 5 years ago

vortexing commented 5 years ago

We are on the west coast and when using AWS as the backend for Version 40 Cromwell, our metadata for any workflow we run comes back with zone of us-east-1 and that is not our zone!!

Not sure if this has been addressed in all the other Cromwell + AWS work, but if not, might be good to check into.

dtenenba commented 5 years ago

More info on this--there is no region in the cromwell config file, and us-west-2 is specified in ~/.aws/config. Also us-east does not occur in any of the WDLs or json files.

So I believe cromwell is supposed to use whatever's set in ~/.aws/config.

Here's an example of where the metadata says the region is us-east-1:

        "runtimeAttributes": {
          "failOnStderr": "false",
          "queueArn": "arn:aws:batch:us-west-2:xxx:job-queue/cromwell-1999",
          "disks": "local-disk /cromwell_root",
          "continueOnReturnCode": "0",
          "docker": "quay.io/fhcrc-microbiome/picard:2.20.1",
          "maxRetries": "1",
          "cpu": "4",
          "cpuMin": "1",
          "noAddress": "false",
          "zones": "us-east-1a",
          "memoryMin": "2 GB",
          "memory": "4 GB"
        },
geoffjentry commented 5 years ago

So if I understood correctly, I believe the issue is you need to specify zones properly in your runtime block. The default value is us-east-1a which tracks with what you're seeing. Cromwell does not (AFAICT) look at ~/.aws/config for anything

dtenenba commented 5 years ago

I got my first aws.conf from somewhere, don't remember exactly, but I think it was from the AWS team, possibly from some version of their cloudformation template, and it had this in it:

 // diff 1:
  # region = "us-west-2" // uses region from ~/.aws/config set by aws configure command,
  #                    // or us-east-1 by default

If that''s correct, it means that cromwell should be looking in ~/.aws/config, but maybe it's not correct. Or it is but Cromwell is not picking up the region somehow.

So zones is supposed to go in the WDL? Something like this?

 runtime {
    docker: "ubuntu:latest"
    zones: "us-west-2"
  }

?

Is there an example documented somewhere?

geoffjentry commented 5 years ago

@dtenenba It's possible the cloudformation templates do some magic to pull the value (@wleepang ?)

There are a couple of places you can specify this:

You can see an example here

dtenenba commented 5 years ago

Thanks. That seems kind of googly, not sure if it maps to AWS concepts. Would I put a region in there (like us-west-2) or an availability zone, such as us-west-2a? I thought that AZ's were governed by the VPC used by the compute environment and thus could not be influenced by anything in aws.conf, a WDL, or workflow options....

geoffjentry commented 5 years ago

Actually the more I'm digging into this I take it all back. For now.

The zones field in the Cromwell code doesn't seem to be actually used anywhere except for tests.

Thining about it now I have a recollection that this was part of the cloud formation setup for the batch configuration. I'll need to dig into this unless @wleepang swoops in with some wizardly knowledge

BTW, it could be (and would make sense) that ~/.aws/conf file is getting picked up via one of the Amazon libraries Cromwell is using. But I see no evidence that it's being directly used by Cromwell itself.

dtenenba commented 5 years ago

I am pretty sure at least the ~/.aws/credentials file is picked up by some amazon library, otherwise no AWS calls would work. Typically AWS libraries pick up the ~/.aws/config file too. Here's how you find out what the region is in python, no idea how to do it in Scala.

import boto3
session = boto3.session.Session()
print(session.region_name)

This ends up matching what's in ~/.aws/config.

geoffjentry commented 5 years ago

Aha ... I think I found what you need (NB I'm not in a position to actually test these theories right now, YMMV and all that)

In your Cromwell config, look at the field aws.region

dtenenba commented 5 years ago

The only occurrence of region in the conf file is the pasted, commented-out excerpt above.

So I can just do

aws {
    region = "us-west-2"
     # other aws stuff goes here....
}

?

geoffjentry commented 5 years ago

Yeah, that's what I"m thinking. Sorry I should have picked up on that when you posted the config block earlier but I was getting confused between the various config file types. Also my mind still tries to think in terms of zone and not region in terms of Cromwell settings :)

wleepang commented 5 years ago

@dtenenba , @geoffjentry - the aws.region in the cromwell.conf file needs to be set. Ideally, it should use settings from ~/.aws/config for "default", but that is not the case. It will pick up the default credentials though. From a CloudFormation standpoint, when creating a Cromwell server, the region is set in the config using a pseudo parameter. See this line. This well be whatever region you launched the template in.

For AZs, those are effectively defined when the Batch Compute Environment is created (they are the subnets you specify, which should match up to AZs you created with the associated VPC.