derijkp / genomecomb

GNU General Public License v3.0
1 stars 0 forks source link

Providing cluster-specific parameters? #1

Closed nick-youngblut closed 4 months ago

nick-youngblut commented 8 months ago

From the howto:

Processing can be distributed over more than one processing unit: The -d 2 parameter in the example distibutes the processing over 2 cores on the local machine. If you have a cluster, you can distribute on the cluster using -d sge (for Sun Grid Engine) or -d slurm

I can't find any docs on how one can provide cluster-specific parameters (e.g., specifying a particular job queue). Is it possible to provide cluster-specific parameters?

More generally, what job submission system (software) are you using? I'm used to snakemake, nextflow, clustermq, and ray, so I'm trying to understand how your job submission system works relative to those tools.

derijkp commented 4 months ago

You can find these options on https://derijkp.github.io/genomecomb/joboptions.html , or using

cg help joboptions

e.g. the queue can be specified using -dqueue

The job system is internal (already in use for a long time, but was never published separately). In functionality it is probably closest to snakemake (was also inspired partly by make, but with a different philosophy): You can restart/resume after an interruption by rerunning with the same command-line, but can also restart after e.g. an update/fix of input files, and only the dependent/affected results will be rerun. It is also possible to add new options (e.g. an extra variant caller) and only those will be run. Of course, like all others, you can run parallel locally or on a cluster. The main difference is that in typical jobsystems you define rules/processes containing code that are executed/strung together based on e.g. an requested results (snakemake); In the genomecomb system, job commands are embedded in procedural code; for these embedded blocks you specify dependencies and targets for that piece of code. This makes it very flexible and easy to test/debug: you can also run the code procedurally/non-parallel, even step by step on the REPL (you can even run parallel jobs on the REPL as well if you want to)

nick-youngblut commented 4 months ago

Thanks @derijkp for the detailed explanation!