geneontology / go-site

A collection of metadata, tools, and files associated with the Gene Ontology public web presence.
http://geneontology.org
BSD 3-Clause "New" or "Revised" License
46 stars 89 forks source link

gorule-0000007 Shared Annotation Matrix: provide documentation #942

Open pgaudet opened 5 years ago

pgaudet commented 5 years ago

Matrix: Add more explanation text to the matrix rule check data html page so curators know what they’re actually looking at on this page.

@ValWood Can you provide some text ?

ValWood commented 5 years ago

I want to see what the violations look like but I can't find the organism-specific errors (again)

My bookmarks are to here http://release.geneontology.org/ and here

from which I can find the intersection rules https://github.com/geneontology/go-site/blob/master/metadata/rules/README.md#gorule0000009

and I can see this report, and the failures

but this isn't what the curators will see is it? they will get a specific link for their species won't they? I can't find this link anywhere (or where to go for the organism-specific lists)

ValWood commented 5 years ago

This is my draft text, but I'd like to check that this makes sense in the context of the report.

~The “Matrix project” uses a set of QC rules generated using co-annotation and biological knowledge. Rules are created if two GO terms are usually never observed to annotate the same gene product simultaneously, after assessing the presence or absence of annotations across a set of evolutionarily diverse species (pombe, cerevisiae, worm, mouse). Violating gene products violating these rules are reported. The curator should look at the gene product’s annotations to both terms, and assess which annotation is in error OR add a “rule challenge” to the Annotation tracker to refine the rule accordingly https://github.com/geneontology/go-annotation For more background information on rule building see https://www.slideshare.net/ValerieWood/copy-of-biocuration-2017~

See revisions below from @mah11

kltm commented 5 years ago

Are we all referring to the same thing here? Note the resources: https://github.com/geneontology/shared-annotation-check/ and, for example: http://release.geneontology.org/2018-12-01/reports/shared-annotation-check.html

ValWood commented 5 years ago

I found this page, http://release.geneontology.org/2018-12-01/reports/shared-annotation-check.html but it isn't organism specific.... What do people see in their organism taxon checks? That is what I can't find...

kltm commented 5 years ago

Organism-specific taxon checks are still in development with @dougli1sqrd and @balhoff . I believe that we do have something though, provided by the old owltools. @dougli1sqrd , is that correct, or have those been shuffled off?

mah11 commented 5 years ago

@kltm - Val is asking whether there are versions of the shared-annotation-check report split out into one page/report per species or contributor, as the gorule checks, predictions, etc. are ("taxon checks" in https://github.com/geneontology/go-site/issues/942#issuecomment-447413725 was a mistake). If not (and it isn't on the to-do list already), one of us should open a ticket requesting this, because it will be a lot more convenient for annotators.

@pgaudet - I've edited the text Val suggested:

The "Matrix" produces annotation QC reports using a set of rules based on observed patterns of biological process term co-annotation, combined with additional biological knowledge. Rules are created if two GO terms are rarely or never used to annotate the same gene product simultaneously, and after assessing the presence or absence of annotations across a set of evolutionarily diverse species (fission yeast, budding yeast, worm, mouse).

Annotations violating these rules are reported. add link(s) to report location(s) here For each reported gene product, the curator should look at both annotated terms, and assess which annotation is in error. If both are correct, open a ticket on the Annotation tracker to refine the rule accordingly (choose labels "Matrix" and "annotation rule").

For more background information on rule building see https://www.slideshare.net/ValerieWood/copy-of-biocuration-2017.

kltm commented 5 years ago

@mah11 I talked to Val about this a little bit, but we never made a ticket. https://github.com/geneontology/shared-annotation-check/issues Minimally, we would bee a species/resource list to divide by. We would probably end up putting it under "pipeline" as a project.

dougli1sqrd commented 5 years ago

@kltm RE Owltools having taxon checks, yes owltools still runs and still reports taxon checks. For example http://current.geneontology.org/reports/aspgd-report.html#otc shows rule violations for GO_AR:0000013 which is the owltools taxon checks. (This example isn't showing taxon violations precisely, but when checking this rule owltools couldn't find the taxon class, so it's erroring here)

ValWood commented 5 years ago

Yes Midori is correct, I want to see these in species-checks.

Also, despite being told a number of times, and book marking the correct place, for some reason I can't find the place to look for the species checks. I, therefore, think others might find this quite challenging. I think this is what Seth is referring too. If people can't find these files, the annotation aren't going to get fixed, so this should be high priority (among all the other high priorities).

Everyone/each resource should also get a periodic reminder link to fix broken rules. if this happens I haven't seen it...

mah11 commented 5 years ago

@kltm

I talked to Val about this a little bit, but we never made a ticket. ... Minimally, we would bee a species/resource list to divide by.

OK, I've opened https://github.com/geneontology/shared-annotation-check/issues/2

cmungall commented 5 years ago

I will try and summarize. I changed the ticket title and I suggest we use "Shared Annotation Matrix" to avoid confusion with any other matrices.

I would like to move things forward and to include shared annotation checks be part of the standard go-rules checks. However, we will have to prioritize this - either on a managers call or at the meeting.

But for now, the scope of this ticket is to add documentation. Val's draft text is good. So the action is for @kltm to either embed or link to this text. Depending on other things we may or may not make this before the meeting.

ValWood commented 5 years ago

Use @mah11 revised text below mine.