Closed vortexing closed 5 years ago
Here is the config file used for the above.
include required(classpath("application"))
"workflow_failure_mode": "ContinueWhilePossible"
webservice {
port = 2525
}
system.file-hash-cache=true
system {
job-rate-control {
jobs = 1
per = 2 second
}
}
call-caching {
enabled = true
invalidate-bad-cache-results = true
}
database {
profile = "slick.jdbc.MySQLProfile$"
db {
# driver = "com.mysql.jdbc.Driver"
driver = "com.mysql.cj.jdbc.Driver"
url = "jdbc:mysql://xxxxxx:xxxx/xxx?rewriteBatchedStatements=true&useSSL=false"
user = "xxx"
password = "xxx"
connectionTimeout = 120000
}
}
aws {
application-name = "cromwell"
auths = [
{
name = "default"
scheme = "default"
}
{
name = "assume-role-based-on-another"
scheme = "assume_role"
base-auth = "default"
role-arn = "arn:aws:iam::xx:role/xxx"
}
]
// diff 1:
# region = "us-west-2" // uses region from ~/.aws/config set by aws configure command,
# // or us-east-1 by default
}
engine {
filesystems {
s3 {
auth = "assume-role-based-on-another"
}
}
}
backend {
default = "AWSBATCH"
providers {
AWSBATCH {
actor-factory = "cromwell.backend.impl.aws.AwsBatchBackendLifecycleActorFactory"
config {
// Base bucket for workflow executions
root = "s3://xxx/cromwell-output"
// A reference to an auth defined in the `aws` stanza at the top. This auth is used to create
// Jobs and manipulate auth JSONs.
auth = "default"
// diff 2:
numSubmitAttempts = 1
// diff 3:
numCreateDefinitionAttempts = 1
default-runtime-attributes {
queueArn: "arn:aws:batch:us-west-2:xxx:job-queue/xxx"
}
filesystems {
s3 {
// A reference to a potentially different auth for manipulating files via engine functions.
auth = "default"
}
}
}
}
}
}
It was noted by @dtenenba that this is likely caused by a need to use multipart upload when copying large files
closing this issue as #4828 covers this situation.
Backend: AWS
Workflow: https://github.com/FredHutch/reproducible-workflows/blob/master/WDL/unpaired-panel-consensus-variants-human/broad-containers-workflow.wdl First input json: https://github.com/FredHutch/reproducible-workflows/blob/master/WDL/unpaired-panel-consensus-variants-human/broad-containers-parameters.json Second input json is LIKE this one, but refers to a batch of 100 input datasets: https://github.com/FredHutch/reproducible-workflows/blob/master/WDL/unpaired-panel-consensus-variants-human/broad-containers-batchofOne.json
Config:
Installed the cromwell version in PR #4790.
Error:
This version of Cromwell does seem to successfully access and copy a cached file from a previous workflow at least on the first task in a shard. This workflow is essentially a batch in which each row of a batch file is passed to a shard and then the tasks run independently on each input dataset and they never gather. However, when the files get larger than the single test data set it seems it can't get to the previous file in order to determine if there's a hit.