LegumeFederation / legfed_gene_families

A repository for managing tasks relating to the production of gene families for use by the Legume Federation
0 stars 0 forks source link

consistency analysis of gene family classifiers #6

Open joelb123 opened 7 years ago

joelb123 commented 7 years ago

The current multistage process of calculating HMM-based classifiers for gene families does not ensure that those classifiers as self-consistent with the original gene classifications.

We would like a tool for assessing self-consistency of gene family classifications and classifiers.

It would also be nice to use this tool to evaluate the performance of classifiers using in-silico mutated and indel'ed proteins from the original set.

adf-ncgr commented 7 years ago

per our discussion; we'll start by using the existing protocol for assignment of protein sequences to families based on HMMs that have been defined by some procedure. That will give us a baseline for understanding both the properties of our gene families with respect to assignment of sequences after they have already been defined as well as setting the stage for being able to assess different gene families definitions with respect to assignments made using the same classifier on the same set of input sequences.