WormBase / genedesc_generator

Automated gene descriptions generator for model organism databases
Other
1 stars 0 forks source link

Create report for genes: compare numbers in data source and numbers in descriptions file #28

Open rankishore opened 6 years ago

rankishore commented 6 years ago

Create a check that compares the numbers of genes with a specific data type in the data source file with the number of genes with that type of data in the gene descriptions file. Example-the number of C. elegans genes with orthology to human genes in the orthology data source file for WS269 should match the number of C. elegans genes with the orthology module in the gene descriptions file for WS269.

valearna commented 5 years ago

In some cases the information loaded into the pipeline is filtered and often modified in a way that it can't be compared with the original data in the raw files. I would start by defining fields that must (or mustn't) be null.

rankishore commented 5 years ago

Add the number of 'information poor' genes for each release to the report 'number_of_concise_descriptions.txt', defined as those genes for which we add protein domain information and/or orthologous human gene molecular function and/or expression cluster information.