GDKO / AvP

Automatic evaluation of HGTs
GNU General Public License v3.0
18 stars 2 forks source link

how is the tree made after classification #24

Open lagphase opened 2 weeks ago

lagphase commented 2 weeks ago

Dear @GDKO ,

Thank you for the great tool. I suppose the tree I should be looking at is in the classification output directory with the extension .nexus. My question is how the tree is made? Is it a multiple sequence alignment of genes identified by BLAST included in HGTs calculation? Let's say if different bacterial strains of the same species are on the tree, does that mean because the % identity obtained from BLAST are different from the query?

GDKO commented 2 weeks ago

Hi @lagphase,

My question is how the tree is made?

For each HGT candidate, the algorithm selects genes identified by blast (or diamond) up to 20 (cutoffextend parameter) subsequent hits right after the first ingroup hit. These sequences are then extracted from the database and groups are created based on the percentage of shared hits. Each group is aligned with mafft and a tree is constructed from this alignment. For each HGT candidate a .nexus file is created for visualisation purposes.

Let's say if different bacterial strains of the same species are on the tree, does that mean because the % identity obtained from BLAST are different from the query?

The % identity has nothing to do with sequence selection from the database. Diamond or BLAST orders hits based on evalue scores.

lagphase commented 2 weeks ago

Hi @GDKO.

Thank you for the details. I want to confirm: does that mean every tip of the tree is possible donor of the gene and the branch length reflects the distance of the HGT event?