`cazomevolve` ('cazome-evolve') investigates the evolution of CAZomes, and identifies CAZy families that co-occur within the genomes of candidate species, more frequently than would be expected by lineage.
CAZomevolve provides a wrapper for dbCAN that retrieves the output from each protein function prediction tool (HMMER, DIAMOND, eCAMI and dbCAN-sub), as well as the consensus, where at least two tools agree upon the same CAZyme family annotation. CAZomevolve drops all CAZyme subfamily annotations from the incorporated tools as these refer to CAZyme clusters generated by dbCAN and do not directly translate to the subfamilies in CAZy.
However, the definition of the consensus (at least 2 tools) is immutable at the moment. Users may want to be able to define what the consensus (e.g. 1 or all 3 tools). This requires minor changes to the CAZomevolve command-line interface (CLI) as well as code within CAZomevolve.
Changes to be made
Update CLI - add new arg (-N, --num_of_tools) for defining the number of tools that need to agree upon a CAZyme family annotation for a given protein to be defined as a consensus annotation. Defaults to 2.
Update get_dbcan_consensus() to take args as a parameter or a new parameter num_of_tools which is used to implement the new CLI args, and apply the args to allow customisation of the dbCAN consensus definition
Update unit tests to include new parameter for get_dbcan_consensus() and test the new code in this function
CAZomevolve provides a wrapper for dbCAN that retrieves the output from each protein function prediction tool (HMMER, DIAMOND, eCAMI and dbCAN-sub), as well as the consensus, where at least two tools agree upon the same CAZyme family annotation. CAZomevolve drops all CAZyme subfamily annotations from the incorporated tools as these refer to CAZyme clusters generated by dbCAN and do not directly translate to the subfamilies in CAZy.
However, the definition of the consensus (at least 2 tools) is immutable at the moment. Users may want to be able to define what the consensus (e.g. 1 or all 3 tools). This requires minor changes to the CAZomevolve command-line interface (CLI) as well as code within CAZomevolve.
Changes to be made
-N
,--num_of_tools
) for defining the number of tools that need to agree upon a CAZyme family annotation for a given protein to be defined as a consensus annotation. Defaults to 2.get_dbcan_consensus()
to takeargs
as a parameter or a new parameternum_of_tools
which is used to implement the new CLI args, and apply the args to allow customisation of the dbCAN consensus definitionget_dbcan_consensus()
and test the new code in this function