Open ValWood opened 5 months ago
GO-CAM: When the metadata tag will be available on models, we will be able to split the stats by GO-CAM versus standard annotations.
By group:
By curator:
For me, to track pathway curation I'm primarily interested in coverage, so the number of genes covered by models; by model I'm referring to genes to be causally connected to another gene (not just a standard annotations, or a gene connected to an activity and a process).
For example, the Reactome covers 11279 human proteins. https://reactome.org/about/statistics That's really useful to know.
The two suggested statistics tally different things. Number of gene products with annotations of any sort says, sort of, what kind of coverage of the organism's genone is provided. The set of tallies earlier in the thread measure aspects of curator activity.
Hi @pgaudet,
I think these different propositions make sense.
Statistics are essential to measure activity, but they should not be misused: the significant over-annotation that we observe from the last 20 years is mainly due to the tendency to make numbers at the expense of the quality.
In my opinion, the real added value in GO-CAM is to connect genes together (or connect genes with small molecules). From that point of view, I would suggest to only consider high-quality models: those with connections, full annotation units/annotons (at least one MF and one BP) and evidences. Other annotations could be calculated as classic GO annotation.
Pascale and I suggest that we first gather more specific requirements for Noctua statistics from curators and then we can come back to the software team.
We'll plan for this discussion on an annotation call.
Can we have a metric on the website about the number of genes in GO-CAM models (by species)
i.e. a non-redundant list of genes that are causally connected (obviously, some genes will be in multiple models), but it would be useful if we could have a way to quickly assess proteome coverage.
cc @pgaudet @vanaukenk