Local sequence alignment tools such as BLAST can return hits with high scores that may be unalignable in the gene family context of multiple sequence alignments. When the scores are used as the basis for gene family definitions, this can result in misclassification. Inclusiion of a misclassified sequence in a family can result in poor results from downstream analysis tools, such as HMM-based alignment or tree calculation.
One cause of misclassification can be scoring of highly-repetitive sequences that do not arise from simple evolutionary processes. Another can be the inclusion of a highly-conserved domain such as an ATPase domain in proteins with unrelated domains.
We would like a tool to assess the likelihood of misclassification. Would be nice if the tool also detected domain structure for multidomain alignments with unalignable domains.
Local sequence alignment tools such as BLAST can return hits with high scores that may be unalignable in the gene family context of multiple sequence alignments. When the scores are used as the basis for gene family definitions, this can result in misclassification. Inclusiion of a misclassified sequence in a family can result in poor results from downstream analysis tools, such as HMM-based alignment or tree calculation.
One cause of misclassification can be scoring of highly-repetitive sequences that do not arise from simple evolutionary processes. Another can be the inclusion of a highly-conserved domain such as an ATPase domain in proteins with unrelated domains.
We would like a tool to assess the likelihood of misclassification. Would be nice if the tool also detected domain structure for multidomain alignments with unalignable domains.