Arcadia-Science / noveltree

NovelTree is a highly parallelized and computationally efficient phylogenomic workflow that infers gene families, gene family trees, species trees, and gene family evolutionary history.
GNU Affero General Public License v3.0
17 stars 3 forks source link

Parameterizing software inputs #27

Closed austinhpatton closed 1 year ago

austinhpatton commented 1 year ago

Description of feature

We need to better specify both how we specify parameters for certain modules and commandline flags to software.

For example, when filtering orthogroups for gene family tree inference, it would be good to provide in the params-file actual numeric values for each of the four filters and have them be easy to change. Or, it would be useful to specify which of the 15 possible protein annotations we would like to download from UniProt, rather than downloading all by default.

In a few cases, it will be best to specify commandline flags to each software using the "task.ext.args" method, where flags to each software is specified in the "conf/modules.config" file of the repository. This looks something like this:

process {
    withName: 'DIAMOND_BLASTP' {
        ext.args = [
            '--ultra-sensitive'
        ].join(' ') 
    }
}

This can also include each modules publishDir - maybe even containers (? - not actually sure) and so might be a good solution for tidying up some of the module files.


Below is a list of what can be done for each module, first using the params.file, and then using the modules.config/task.ext.args method.

Parameter file:

modules.config tax.args.ext method: