NovelTree is a highly parallelized and computationally efficient phylogenomic workflow that infers gene families, gene family trees, species trees, and gene family evolutionary history.
We need to better specify both how we specify parameters for certain modules and commandline flags to software.
For example, when filtering orthogroups for gene family tree inference, it would be good to provide in the params-file actual numeric values for each of the four filters and have them be easy to change. Or, it would be useful to specify which of the 15 possible protein annotations we would like to download from UniProt, rather than downloading all by default.
In a few cases, it will be best to specify commandline flags to each software using the "task.ext.args" method, where flags to each software is specified in the "conf/modules.config" file of the repository. This looks something like this:
This can also include each modules publishDir - maybe even containers (? - not actually sure) and so might be a good solution for tidying up some of the module files.
Below is a list of what can be done for each module, first using the params.file, and then using the modules.config/task.ext.args method.
Parameter file:
[x] _FILTERORTHOGROUPS: (these four are currently hard coded in the wordflow)
min_num_spp - Minimum number of species the retained orthogroups must include
min_num_groups - Minimum number of taxonomic groups.....
max_copy_num_filt1 - maximum mean copy number per species for the gene family for species tree inference....
max_copy_num_filt2 - maximum mean copy number per species for any gene tree inference.
[x] _ANNOTATEUNIPROT:
Should specify somehow which annotations (of the 15 possible sets) the user would like to download
Could either be a list of names (e.g. "localization,function,interactions") with simply "all" being one option (to download all), or could be a set of numbers, referencing the index of some list of annotation sets we provide to them (e.g. "1,2,4" corresponding to 1: localization, 2: function, 3: uninteresting, 4: interactions)
modules.config tax.args.ext method:
[x] _DIAMONDBLASTP:
blast_columns - should just delete this from the module and remove the "[]" specification from the workflow
--ultra-sensitive - currently specified as ext.args in modules.config
[x] MAFFT:
--localpair --maxiterate 1000 --anysymbol - currently hardcoded in - move to modules.config as ext.args?
[x] CLIPKIT:
add in task.ext.args to allow for custom arguments to the software
[x] IQTREE:
Should move the model specification to task.ext.args?
Then, probably should have a PMSF true/false flag, since the approximation is involved....
[x] ASTEROID:
probably would be good to add the task.ext.args flag for this to allow for custom specification of other flags to the software.
for instance, the number of random starting trees used, whether doing bootstraps, etc.
[x] SPECIESRAX:
Multiple flags to the software should probably be moved to modules.config and specified using the task.ext.args method - will make it more generalizable/customizable for people I think?
Description of feature
We need to better specify both how we specify parameters for certain modules and commandline flags to software.
For example, when filtering orthogroups for gene family tree inference, it would be good to provide in the params-file actual numeric values for each of the four filters and have them be easy to change. Or, it would be useful to specify which of the 15 possible protein annotations we would like to download from UniProt, rather than downloading all by default.
In a few cases, it will be best to specify commandline flags to each software using the "task.ext.args" method, where flags to each software is specified in the "conf/modules.config" file of the repository. This looks something like this:
This can also include each modules publishDir - maybe even containers (? - not actually sure) and so might be a good solution for tidying up some of the module files.
Below is a list of what can be done for each module, first using the params.file, and then using the modules.config/task.ext.args method.
Parameter file:
[x] _FILTERORTHOGROUPS: (these four are currently hard coded in the wordflow)
[x] _ANNOTATEUNIPROT:
Should specify somehow which annotations (of the 15 possible sets) the user would like to download
Could either be a list of names (e.g. "localization,function,interactions") with simply "all" being one option (to download all), or could be a set of numbers, referencing the index of some list of annotation sets we provide to them (e.g. "1,2,4" corresponding to 1: localization, 2: function, 3: uninteresting, 4: interactions)
modules.config tax.args.ext method:
[x] _DIAMONDBLASTP: blast_columns - should just delete this from the module and remove the "[]" specification from the workflow --ultra-sensitive - currently specified as ext.args in modules.config
[x] MAFFT: --localpair --maxiterate 1000 --anysymbol - currently hardcoded in - move to modules.config as ext.args?
[x] CLIPKIT:
add in task.ext.args to allow for custom arguments to the software
[x] IQTREE:
Should move the model specification to task.ext.args?
Then, probably should have a PMSF true/false flag, since the approximation is involved....
[x] ASTEROID:
probably would be good to add the task.ext.args flag for this to allow for custom specification of other flags to the software.
[x] SPECIESRAX:
Multiple flags to the software should probably be moved to modules.config and specified using the task.ext.args method - will make it more generalizable/customizable for people I think?
These flags include: --per-family-rates --rec-model UndatedDTL --prune-species-tree --si-strategy HYBRID --si-quartet-support --si-estimate-bl --strategy SPR
[x] GENERAX:
Similar to speciesrax, but including: --rec-model UndatedDTL --prune-species-tree --per-family-rates --strategy SPR