Open cmungall opened 2 years ago
@cmungall I think this report has something close: https://docs.google.com/spreadsheets/d/1hMGJ8MFu1ozO3pHt44G2PPN9taWRPqEvb-1MK6WqVOI/edit#gid=1446471698
The section separated by taxon ID counts annotations rather than distinct genes but I added this section specifically to help spot large drops in IBAs from release to release.
If the annotation count isn't granular enough (need the distinct gene count) can I just add this new stat to this same report?
I think the point is that we want something that is executed as part of the pipeline, with output in a standard place
Discussed on managers call https://github.com/geneontology/pipeline/issues/300
A statistic that is very useful for GO is the number of genes that are not mapped to the reference proteome for which we are losing IBA annotations
An ad-hoc way to get this:
[overcounts because I am lazily not filtering comments etc but you get the point]
This is based on the assumption that the paint_MOD files default to uniprot for where there is mapping
This could be done more systematically as part of the pipeline, with stats files generated
I also think it might be nice to consider this as a go rule ("all genes with ancestral annotations should have unambiguous mappings to uniprot") such that these numbers could be shown in the general report dashboard, but this requires further discussion