flass / pantagruel

a pipeline for reconciliation of phylogenetic histories within a bacterial pangenome
GNU General Public License v3.0
46 stars 7 forks source link

in -u mode, make the numbering of homogeneous gene families (coding identical proteins) consistent with source db #50

Open flass opened 2 years ago

flass commented 2 years ago

When $updatefromdb variable is set i.e. when calling pantagruel with the -u|--update_from option, at the moment, only protein families defined during task 01 are made to match the protein/gene family labelling from the source database. This should be done as well when defining so-called "homogeneous" gene families (clusters of CDSs coding identical proteins) during task 02 when mapping all (possibly redundant) CDSs to the non-redundant protein set (used to define protein families); this takes place during the call to scripts/extract_full_prot_and_cds_family_alignments.py.