eead-csic-compbio / get_homologues

GET_HOMOLOGUES: a versatile software package for pan-genome analysis
Other
104 stars 25 forks source link

maybe a bug in annotate_cluster.pl #94

Closed carolynzy closed 2 years ago

carolynzy commented 2 years ago

Hi,

I found something strange in the output file of annotate_cluster.pl.

For example, sequence with id of "ID:9156_01794" is of length 1785. If I annotate the cluster containing this sequence, I got this line in the output file:

>ID:9156_01794[Genus_species] 2929 0.0 2 + + 100.0% 100.0% 1:1893 1:1785

However, the following sequence after this line is of length 1893, which is the longest sequence in this cluster. I think this sequence should be 1785 nt not 1893 nt. I blasted these two sequences and found that the region from 1434 to 1590 in the original sequence could be aligned to 1434 to 1589 and 1542 to 1698 in the longest sequence. That might be the reason to cause this issue.

The cluster file can be found here: 4282_hypothetical_protein.txt

And the outputfile is here: 4282_hypothetical_protein.ann.txt

Please have a look of this. Thank you for your time!

eead-csic-compbio commented 2 years ago

Hi @carolynzy , you are right, this was a bug, which should be fixed after commit https://github.com/eead-csic-compbio/get_homologues/commit/3b17b148813bfbb22495c522be603c32c702cf3d

This was a tricky one as the ID:9156_01794 sequence contained a perfect repeat that could align in two parts of the target. After the fix the script annotate_cluster.pl now takes only 1hsp out of BLAST which is what you expect in most cases for a local alignment. However, in your sequence this means that a bit of the 3' end is not aligned since there is a rather long indel (you can check that with a global alignment ie with clustal-omega).

This a good case to showcase the differences between a local alignment (as produced by annotate_clusters.pl) an end-to-end global alignment. The first one is probably enough for many applications, but does has some limitations (see the manual). Depending on what you want to do you might actually need global alignments.

Hope this helps, and thanks for your feedback as ever, Bruno

carolynzy commented 2 years ago

Hi @bruno, that explaination is very helpful. Thank you very much for the prompt response!