eead-csic-compbio / get_homologues

GET_HOMOLOGUES: a versatile software package for pan-genome analysis
Other
110 stars 26 forks source link

ANI coding sequences + noncoding? #115

Closed TommyH-Tran closed 5 months ago

TommyH-Tran commented 7 months ago

From my understanding our ANI calculations are only for CDS, whereas others include all sequences (noncoding and coding). There are sometimes minor discrepancies in results. Is there any way get_homologues could generate ANIs using both the noncoding and coding sequences?

brunocontrerasmoreira commented 7 months ago

By default get_homologues.pl -A will report ANI computed from BLASTP protein alignments. As proteins are more conserved than nucleotide sequences, this should give you high identities. Instead, get_homologues.pl -A -c 'CDS' will report ANI computed from BLASTN nucleotide CDS alignments. You can change that to -a 'tRNA,rRNA' if you wish but you won't be able to use it to intergenic regions unless they correspond to features in a GenBank file.