Cromwell configuration - Githubissues

drtconway commented 4 years ago

Hi Janis,

I'm running Janis with Cromwell+Singularity on my Ubuntu box, and I want to scatter in my workflow, but when I do, it tries to run a large number of big jobs all at once. Since my machine has nowhere near enough resources to run (e.g.) 10 BWA jobs concurrently, I tried changing the Cromwell configuration in my janis.config but it has no effect.

My config is

engine: cromwell
notifications:
  email: null
template:
  container_dir: /singularity
  id: singularity
cromwell:
  max_concurrent_workflows: 1
  concurrent_job_limit: 1

I assmume this is not correct, because there's nothing reflecting the two cromwell options in the cromwell configuration in my run directory:

include required(classpath("application"))

akka: {
  "actor.default-dispatcher.fork-join-executor": {
    "parallelism-max": 3
  }
}
system: {
  "job-shell": "/bin/sh",
  "cromwell_id": "cromwell-b96e80",
  "cromwell_id_random_suffix": false
}
database: {
  "db": {
    "driver": "org.hsqldb.jdbcDriver",
    "url": "jdbc:hsqldb:file:/data/work/her2-sra/one/janis/database/cromwelldb;\nshutdown=false;\nhsqldb.default_table_type=cached;\nhsqldb.tx=mvcc;\nhsqldb.result_max_memory_rows=2500;\nhsqldb.large_data=true;\nhsqldb.applog=1;\nhsqldb.lob_compressed=true;\nhsqldb.script_format=3\n",
    "connectionTimeout": 300000,
    "num_threads": 1
  },
  "profile": "slick.jdbc.HsqldbProfile$"
}
backend: {
  "default": "Local",
  "providers": {
    "Local": {
      "actor-factory": "cromwell.backend.impl.sfs.config.ConfigBackendLifecycleActorFactory",
      "config": {
        "root": "/data/work/her2-sra/one/janis/execution",
        "filesystems": {
          "local": {
            "localization": [
              "hard-link",
              "cached-copy"
            ],
            "enabled": true,
            "caching": {
              "duplication-strategy": [
                "hard-link",
                "cached-copy",
                "copy",
                "soft-link"
              ],
              "hashing-strategy": "file"
            }
          }
        },
        "runtime-attributes": "String? docker",
        "submit-docker": "\n                    None\n\n                    docker_subbed=$(sed -e 's/[^A-Za-z0-9._-]/_/g' <<< ${docker})\n                    image=/singularity/$docker_subbed.sif\n                    lock_path=/singularity/$docker_subbed.lock\n\n                    singularity pull $image docker://${docker}\n\n                    singularity exec --bind ${cwd}:${docker_cwd} $image ${job_shell} ${docker_script}\n                    ",
        "run-in-background": true
      }
    }
  }
}
call-caching: {
  "enabled": true
}

drtconway commented 4 years ago

I note, that once I've worked out how to get just one job at a time running, I'll probably make it the default for my docker image.

illusional commented 4 years ago

I don't have any mechanism for arbitrarily providing Cromwell configuration params - only a way to provide a complete cromwell configuration. I can work with you to build a PR, but there are a few ways:

Adding a way provide arbitrary k-v pairs in the janis cromwell configuration
Providing this to the config output (somewhere in the Cromwell engine startup)

Or, you can add the respective keys to the generic CromwellConfiguration and change the Singularity template to respect those new keys - I think this is the least best option though.

drtconway commented 4 years ago

The class System already has one of them, aparently (line 180, and related in that class):

"max_concurrent_workflows": "max-concurrent-workflows",

but I didn't see it showing up.

I agree though, making a template is probably a good way to go. It fits the model.

drtconway commented 4 years ago

It looks straight-forward. After the cluster users' seminar this morning, I'll have a crack at it.

PMCC-BioinformaticsCore / janis-core

Cromwell configuration #34