ewels / clusterflow

A pipelining tool to automate and standardise bioinformatics analyses on cluster environments.
https://ewels.github.io/clusterflow/
GNU General Public License v3.0
97 stars 27 forks source link

Change config syntax? #97

Closed ewels closed 8 years ago

ewels commented 8 years ago

The current syntax for Cluster Flow configs / genomes makes me wince every time I look at it. It's custom written and relies on pretty fragile hardcoded parsing of stuff like special @ characters and /* comments.

It could be nice to change to something more standard (probably YAML). This would probably be easier to edit and less prone to accidental errors.

eg. Convert this:

/****************/
/* Core Options */
/****************/
/* See the Cluster Flow manual for descriptions */
@email  0
@check_updates  1d
@notification   suspend
@notification   abort
@notification   complete
@split_files    1
/* @priority    -500 */ /* Negative for GRIDEngine, positive for SLURM */
@total_cores    16
@total_mem  128G

to this:

### Core Options
email: 0
check_updates: '1d'
notifications:
  - 'suspend'
  - 'abort'
  - 'complete'
split_files: 1
# priority: -500 # Negative for GRIDEngine, positive for SLURM
total_cores: 16
total_mem: '128G'

Would also use new features such as nesting to clean up some syntax, eg instead of:

/* If your modules have a different name to those being requested in
   a module, you can create aliases. You can also use aliases to
     specify particular module versions */
@environment_module_alias   fastqc  FastQC/0.11.2
@environment_module_alias   trim_galore TrimGalore
@environment_module_alias   sratoolkit  sratools
@environment_module_alias   tophat  tophat/2.0.12
@environment_module_alias   STAR    star

Could have this:

# If your modules have a different name to those being requested in a module, you
# can create aliases. You can also use aliases to specify particular module versions
environment_module_aliases:
    fastqc: 'FastQC/0.11.2'
    trim_galore: 'TrimGalore'
    sratoolkit: 'sratools'
    tophat: 'tophat/2.0.12'
    STAR: 'star'

This would obviously require some rewriting of Constants.pm and probably a new requirement in the format of a module to parse the YAML (or whatever). I'm assuming that Perl doesn't have a standard package to do this? Worst, it would break backwards compatibility. There are a few options here:

  1. Suck it up and release with v0.4. There's a tonne of stuff that breaks backwards compatability from v0.4 with this release anyway.
  2. Maintain both sets of parsing code and determine which to use based on the file extension.
  3. Stop being OCD, do nothing and try to control my flinches when looking at these files.

This is very much an open suggestion / question at this point. Any thoughts @s-andrews / @darogan / @stu2 / others?

darogan commented 8 years ago

This is a fairly easy file to configure so a format change and breaking of backwards compatibility isn't so much of an issue (for me at least). So if it help with your OCD, go for it

s-andrews commented 8 years ago

Meh.

The old format doesn't really bother me - it seems a little over-complicated with the @ symbols and the like when a simple tab (or even whitespace) delimited format would probably do just as well.

It's going to be trivial to migrate to the new format, but to be honest the YAML version looks easier to get wrong than the previous one.

Do what you will...

ewels commented 8 years ago

Ok, general apathy - I will swallow my OCD and leave it how it is.

I think the @ symbol stuff was because the config options can be specified in run files / pipeline configs? But in reality I don't think any pipeline files have config options in them any more, so not really used.