LegumeFederation / legfed_gene_families

A repository for managing tasks relating to the production of gene families for use by the Legume Federation
0 stars 0 forks source link

alignability assessment #5

Open joelb123 opened 7 years ago

joelb123 commented 7 years ago

Local sequence alignment tools such as BLAST can return hits with high scores that may be unalignable in the gene family context of multiple sequence alignments. When the scores are used as the basis for gene family definitions, this can result in misclassification. Inclusiion of a misclassified sequence in a family can result in poor results from downstream analysis tools, such as HMM-based alignment or tree calculation.

One cause of misclassification can be scoring of highly-repetitive sequences that do not arise from simple evolutionary processes. Another can be the inclusion of a highly-conserved domain such as an ATPase domain in proteins with unrelated domains.

We would like a tool to assess the likelihood of misclassification. Would be nice if the tool also detected domain structure for multidomain alignments with unalignable domains.