geneontology / go-site

A collection of metadata, tools, and files associated with the Gene Ontology public web presence.
http://geneontology.org
BSD 3-Clause "New" or "Revised" License
46 stars 89 forks source link

Stats for GO-CAM models #2339

Open ValWood opened 5 months ago

ValWood commented 5 months ago

Can we have a metric on the website about the number of genes in GO-CAM models (by species)

i.e. a non-redundant list of genes that are causally connected (obviously, some genes will be in multiple models), but it would be useful if we could have a way to quickly assess proteome coverage.

cc @pgaudet @vanaukenk

pgaudet commented 4 months ago

GO-CAM: When the metadata tag will be available on models, we will be able to split the stats by GO-CAM versus standard annotations.

By group:

By curator:

ValWood commented 4 months ago

For me, to track pathway curation I'm primarily interested in coverage, so the number of genes covered by models; by model I'm referring to genes to be causally connected to another gene (not just a standard annotations, or a gene connected to an activity and a process).

For example, the Reactome covers 11279 human proteins. https://reactome.org/about/statistics That's really useful to know.

deustp01 commented 4 months ago

The two suggested statistics tally different things. Number of gene products with annotations of any sort says, sort of, what kind of coverage of the organism's genone is provided. The set of tallies earlier in the thread measure aspects of curator activity.

sylvainpoux commented 4 months ago

Hi @pgaudet,

I think these different propositions make sense.

Statistics are essential to measure activity, but they should not be misused: the significant over-annotation that we observe from the last 20 years is mainly due to the tendency to make numbers at the expense of the quality.

In my opinion, the real added value in GO-CAM is to connect genes together (or connect genes with small molecules). From that point of view, I would suggest to only consider high-quality models: those with connections, full annotation units/annotons (at least one MF and one BP) and evidences. Other annotations could be calculated as classic GO annotation.

vanaukenk commented 4 months ago

Pascale and I suggest that we first gather more specific requirements for Noctua statistics from curators and then we can come back to the software team.

We'll plan for this discussion on an annotation call.