Open joelb123 opened 7 years ago
per our discussion; we'll start by using the existing protocol for assignment of protein sequences to families based on HMMs that have been defined by some procedure. That will give us a baseline for understanding both the properties of our gene families with respect to assignment of sequences after they have already been defined as well as setting the stage for being able to assess different gene families definitions with respect to assignments made using the same classifier on the same set of input sequences.
The current multistage process of calculating HMM-based classifiers for gene families does not ensure that those classifiers as self-consistent with the original gene classifications.
We would like a tool for assessing self-consistency of gene family classifications and classifiers.
It would also be nice to use this tool to evaluate the performance of classifiers using in-silico mutated and indel'ed proteins from the original set.