geneontology / amigo

AmiGO is the public interface for the Gene Ontology.
http://amigo.geneontology.org
BSD 3-Clause "New" or "Revised" License
29 stars 17 forks source link

Term search is not retrieving all annotated gene products #273

Closed ValWood closed 8 years ago

ValWood commented 8 years ago

When I search on "DNA recombination" and download gene products in pombe I get 103 gene products. For PomBase I get 117. I compared the lists and the following were not in both lists: However, the ones that I spot checked WERE annotated to DNA recombination in AmiGO

19 elements included exclusively in "List 2": SPCC24B10.08c SPAC30D11.10 SPBC3D6.11c SPBC9B6.02c SPAC6B12.02c SPCC1020.14 SPBC887.14c SPAC27E2.08 SPAC13D1.01c SPAC167.08 SPAC26A3.13c SPBC1921.02 SPBC1289.17 SPAC1687.03c SPAP27G11.15 SPAC9.04 SPAC2E1P3.03c SPAPB15E9.03c SPAC19D5.09c

kltm commented 8 years ago

Could you qualify the exact procedure that you're using to generate these download lists so that we can duplicate the describe behaviour? I'm having trouble teasing out what you were doing here.

ValWood commented 8 years ago

http://amigo2.berkeleybop.org/amigo/term/GO:0006310

select link to gene products http://amigo2.berkeleybop.org/amigo/search/bioentity?q=*:*&fq=regulates_closure:%22GO:0006310%22&sfq=document_category:%22bioentity%22

filter on pombe

gives 103 results.

Download results.

Compared to PomBase where I have 117. Difference (genes not in list downloaded from AmiGO) is above. Gene products above ARE annotated to "DNA recombination" when I go to the page for the gene product.....

ValWood commented 8 years ago

actually the whole list is not a problem. Some are transposons, we don't submit these to GO

These are the ones which were not included in the dwonload results, but when I checked the pages, they had a "DNA recombination" annotation SPCC24B10.08c ada2 SAGA complex subunit Ada2 protein_coding III SPAC6B12.02c mus7 DNA repair protein Mus7/Mms22 protein_coding I
SPBC887.14c pfh1 5' to 3' DNA helicase, involved in DNA recombination and repair Pif1
SPAC30D11.10 rad52 DNA recombination protein, Rad51 mediator Rad52 (previously Rad22)
SPBC1921.02 rad60 DNA repair protein Rad60
SPAC1687.03c rfc4 DNA replication factor C complex subunit Rfc4 (predicted)
SPAP27G11.15 slx1 structure-specific endonuclease catalytic subunit Slx1 protein_coding SPBC3D6.11c slx8 SUMO-targeted ubiquitin-protein ligase E3 Slx8

( I did not check them all, but I checked 3)

ValWood commented 8 years ago

see on the mus7 page http://amigo2.berkeleybop.org/amigo/search/annotation?q=mus7 double-strand break repair via homologous recombination but it sin't in the search results

cmungall commented 8 years ago

First hypothesis is there is an ontology difference between the two loads; 2nd is that a difference in GAFs between two loads

Let's take one example: http://www.pombase.org/spombe/result/SPCC24B10.08c (7 terms)

http://amigo2.berkeleybop.org/amigo/gene_product/PomBase:SPCC24B10.08c (10 terms)

Let's explore the differences here

cmungall commented 8 years ago

OK, forget my example, let's look at mus7

ValWood commented 8 years ago

but I can see the annotation in AmiGO which should place it in my result set "double-strand break repair via homologous recombination" is an is_a descendant of "DNA recombination"

ValWood commented 8 years ago

..so AmiGO isn't internally consistent...

cmungall commented 8 years ago

It looks like a discrepancy between bioentity and annotation results

@hdietze please look at this

Annotations:

http://amigo2.berkeleybop.org/amigo/term/GO:0006310 select "PomBase" Type "mus7" in free text filtering. You will see the direct annotation to 'double-strand break repair via homologous recombination' as expected

Entities

http://amigo2.berkeleybop.org/amigo/search/bioentity?q=*:*&fq=regulates_closure:%22GO:0006310%22&sfq=document_category:%22bioentity%22 Select "PomBase" type "mus7" It doesn't show up

cmungall commented 8 years ago

(btw, I'm trying to debug using gannet, but gannet is gone)

kltm commented 8 years ago

(gannet is not considered a "production" item, so does not exist outside of labs)

cmungall commented 8 years ago

We've figured it out. The PAINT annotations are clobbering the bioentity closure.

I reopened this: https://github.com/owlcollab/owltools/issues/134#issuecomment-153177346

More detailed explanation coming soon.

Many thanks @ValWood.

kltm commented 8 years ago

Short term, the GAFs have been reordered anyways, so the more expected default will win. Long-term, well, we can continue on the other ticket.

cmungall commented 8 years ago

Reopening until we've decided a resolution. Should we turn off bioentity views for now?

kltm commented 8 years ago

That is not something that can be done without some fairly invasive code.

kltm commented 8 years ago

@cmungall What AmiGO-level fix would you suggest?

cmungall commented 8 years ago

It looks like only PomBase is affected by this. A bug in the paint publishing pipeline leaks PomBase entries into paint_other; normally this shouldn't be catastrophic, but when combined with the only-now-revealed bioentity clobbering issue, the problem arises.

kltm commented 8 years ago

Okay, as I understand this, the close of this is near. With the fix and reload coming as part of #272 (which should close it out, thanks to @hdietze ) and the reordering of the GAF load already in, the PomBase annotations that shouldn't be there should have their closures be clobbered by the correct ones anyways--just by ordering. So triaged until the more major upstream PAINT/owltools fix that will give us more general fixes for this kind of issue.

ValWood commented 8 years ago

I think I followed all the tickets and you don't require any feedback from me, but ping me if you do.

A couple of question, just curious. What is PAINT clobbering? why is only PomBase affected? Have we done something differently? A while ago, I asked how pombe PAINT annotations were getting into AmiGO, without coming through PomBase (we do not currently include PAINT annotations, but we plan to shortly.We would filter those redundant with experimental annotations, and those where we have reported a problem here https://github.com/geneontology/go-annotation/issues so that the numbers in PomBase and AmiGO agreed). Will the reload fix the problem with th missing description lines too?

kltm commented 8 years ago

From @hdietze 's remark above, you can start looking at the cascade of issues from here: geneontology/go-site#115. TL;DR: there is nothing wrong with any data that PomBase is producing, we're triaging it now, will have better solutions in the future.

As far as missing description lines goes, I'm assuming that's long names? Those were likely lost in the clobber as well. We'll know in a few hours when this load completes.

kltm commented 8 years ago

"mus7" now shows up and the PomBase count from above now reports as 117.