PMCC-BioinformaticsCore / janis-core

Core python modules for Janis Pipeline workflow assistant
GNU General Public License v3.0
4 stars 9 forks source link

Cromwell configuration #34

Open drtconway opened 4 years ago

drtconway commented 4 years ago

Hi Janis,

I'm running Janis with Cromwell+Singularity on my Ubuntu box, and I want to scatter in my workflow, but when I do, it tries to run a large number of big jobs all at once. Since my machine has nowhere near enough resources to run (e.g.) 10 BWA jobs concurrently, I tried changing the Cromwell configuration in my janis.config but it has no effect.

My config is

engine: cromwell
notifications:
  email: null
template:
  container_dir: /singularity
  id: singularity
cromwell:
  max_concurrent_workflows: 1
  concurrent_job_limit: 1

I assmume this is not correct, because there's nothing reflecting the two cromwell options in the cromwell configuration in my run directory:

include required(classpath("application"))

akka: {
  "actor.default-dispatcher.fork-join-executor": {
    "parallelism-max": 3
  }
}
system: {
  "job-shell": "/bin/sh",
  "cromwell_id": "cromwell-b96e80",
  "cromwell_id_random_suffix": false
}
database: {
  "db": {
    "driver": "org.hsqldb.jdbcDriver",
    "url": "jdbc:hsqldb:file:/data/work/her2-sra/one/janis/database/cromwelldb;\nshutdown=false;\nhsqldb.default_table_type=cached;\nhsqldb.tx=mvcc;\nhsqldb.result_max_memory_rows=2500;\nhsqldb.large_data=true;\nhsqldb.applog=1;\nhsqldb.lob_compressed=true;\nhsqldb.script_format=3\n",
    "connectionTimeout": 300000,
    "num_threads": 1
  },
  "profile": "slick.jdbc.HsqldbProfile$"
}
backend: {
  "default": "Local",
  "providers": {
    "Local": {
      "actor-factory": "cromwell.backend.impl.sfs.config.ConfigBackendLifecycleActorFactory",
      "config": {
        "root": "/data/work/her2-sra/one/janis/execution",
        "filesystems": {
          "local": {
            "localization": [
              "hard-link",
              "cached-copy"
            ],
            "enabled": true,
            "caching": {
              "duplication-strategy": [
                "hard-link",
                "cached-copy",
                "copy",
                "soft-link"
              ],
              "hashing-strategy": "file"
            }
          }
        },
        "runtime-attributes": "String? docker",
        "submit-docker": "\n                    None\n\n                    docker_subbed=$(sed -e 's/[^A-Za-z0-9._-]/_/g' <<< ${docker})\n                    image=/singularity/$docker_subbed.sif\n                    lock_path=/singularity/$docker_subbed.lock\n\n                    singularity pull $image docker://${docker}\n\n                    singularity exec --bind ${cwd}:${docker_cwd} $image ${job_shell} ${docker_script}\n                    ",
        "run-in-background": true
      }
    }
  }
}
call-caching: {
  "enabled": true
}
drtconway commented 4 years ago

I note, that once I've worked out how to get just one job at a time running, I'll probably make it the default for my docker image.

illusional commented 4 years ago

I don't have any mechanism for arbitrarily providing Cromwell configuration params - only a way to provide a complete cromwell configuration. I can work with you to build a PR, but there are a few ways:

Or, you can add the respective keys to the generic CromwellConfiguration and change the Singularity template to respect those new keys - I think this is the least best option though.

drtconway commented 4 years ago

The class System already has one of them, aparently (line 180, and related in that class):

"max_concurrent_workflows": "max-concurrent-workflows",

but I didn't see it showing up.

I agree though, making a template is probably a good way to go. It fits the model.

drtconway commented 4 years ago

It looks straight-forward. After the cluster users' seminar this morning, I'll have a crack at it.