geneontology / go-site

A collection of metadata, tools, and files associated with the Gene Ontology public web presence.
http://geneontology.org
BSD 3-Clause "New" or "Revised" License
43 stars 89 forks source link

PomBase inferred GAF missing annotations #2226

Open mah11 opened 8 years ago

mah11 commented 8 years ago

We've noticed this the last few times we've committed the PomBase GAF to GO: the gene_association.pombase.inf.gaf file produced by the Jenkins job has far fewer inferred annotations than the equivalent file I get when I run the owltools checks locally (most recently 342 in build 97 vs. 2355 annotations).

I haven't examined every inferred annotation in detail, but some spot-checking suggests that it's not just redundant annotations in one file but not the other; rather, there are annotations that we expect to see inferred that are missing from the Jenkins file. For example, PomBase has 19 genes annotated to GO:0005484, which is part_of GO:0061025. The local file has 16 annotations to GO:0061025 inferred, but none are in the Jenkins file.

The difference also doesn't correlate exactly with evidence codes. There are a few codes that appear only in our locally generated .inf.gaf (IEP, NAS, TAS), but for all codes that are in the Jenkins .inf.gaf there are fewer annotations than in our local file.

Is there anything else I should check for, e.g. in our owltools configuration? (We definitely think the locally produced inferred annotation set is better because it's more complete.)

This might also connected to geneontology/go-annotation#1230

thanks m

mah11 commented 8 years ago

p.s. The inferred GAFs from builds 93-95 are also suspiciously small. That's as far back as the pombase GAF build history goes on the Jenkins site.

mah11 commented 8 years ago

This ticket describes a problem with annotations, so opening it on the annotation tracker is entirely reasonable. I also don't see a tracker that's obviously for software issues generally or Jenkins/OORT specifically.

tberardini commented 8 years ago

Talked with @hdietze and this issue is in the right tracke and on his radar and TODO list. Stay tuned.

ValWood commented 8 years ago

We still aren't getting the Function-Process links we expect.... Usually we would make the annotation explicitly, but its nice for modifications not to have to bother. This means we effectively have missing annotations....any ideas how long this will be?

mah11 commented 7 years ago

ping!

From a quick glance at file sizes in Jenkins vs. my local owltools runs, this hasn't changed and is therefore still a problem. Any news?

cmungall commented 7 years ago

sorry haven't had a chance to check and am out this week.

Are you running on the submission directory?

jenkins runs this makefile:

http://www.geneontology.org/gene-associations/submission/Makefile

it calls this:

CATALOG_XML=ontology/extensions/catalog-v001.xml make -f gaf/submission/Makefile pombase

and obvious differences?

ValWood commented 7 years ago

where does the inferred gaf live again? I want to check something ......

mah11 commented 7 years ago

Quoth Chris:

Are you running on the submission directory? [and stuff about a makefile]

I'm not sure I understand the question(s) fully, as I'm not aware of explicitly invoking a makefile. I run owltools locally using a shell alias that feeds it these parameters (with newlines for clarity; it uses the actual absolute path for my home dir instead of ~):

% owltools
 --catalog-xml ~/go_svn/ontology/extensions/catalog-v001.xml http://purl.obolibrary.org/obo/go/extensions/go-gaf.owl
 --gaf ~/go_svn/gene-associations/submission/gene_association.pombase.gz
 --gaf-report-file ~/Documents/in_progress/gafchecks/gaf-validation-report.txt
 --gaf-report-summary-file ~/Documents/in_progress/gafchecks/gaf-validation-summary.txt
 --gaf-prediction-file ~/Documents/in_progress/gafchecks/gene_association.pombase.inf.gaf
 --gaf-prediction-report-file ~/Documents/in_progress/gafchecks/gaf-prediction-report.txt
 --experimental-gaf-prediction-file ~/Documents/in_progress/gafchecks/gene_association.pombase.experimental.gaf
 --experimental-gaf-prediction-report-file ~/Documents/in_progress/gafchecks/gaf-prediction-experimental-report.txt
 --gaf-run-checks

Differences don't leap out at me, but I don't have a lot of experience eyeballing makefiles, so I could easily be missing something.

mah11 commented 7 years ago

p.s. I have owltools installed at /Applications/owltools-read-only/OWLTools-Runner/bin/owltools, date-stamped Mar 31 2015.

mah11 commented 7 years ago

Quoth Val:

where does the inferred gaf live again?

The GOC one that's the subject of this ticket (i.e. it's missing stuff) lives at the GO Jenkins site. URL for the latest build is http://build.berkeleybop.org/job/gaf-check-pombase/ - it's the gene_association.pombase.inf.gaf file.

The local version, which has what we would expect, is really truly local to me, as in "on my desktop machine". I can email you the most recent one, which accompanies the v61 release; if you want I can also generate a file starting from the GAF from any of Kim's interim Chado builds.

ValWood commented 7 years ago

OK I was worried that somehow my filters were causing some confusion. But if the GO GAF is missing stuff it can't be that.

cmungall commented 7 years ago

p.s. I have owltools installed at /Applications/owltools-read-only/OWLTools-Runner/bin/owltools, date-stamped Mar 31 2015.

Hmm, the date of the wrapper script shouldn't matter, but if the owltools.jar is that old, this could explain it

mah11 commented 7 years ago

I have

% ls -l /Applications/owltools-read-only/OWLTools-Runner/bin/
total 67616
-rwxr-xr-x  1 root  admin      3997 Mar 31  2015 obo-roundtrip*
-rwxr-xr-x  1 root  admin      3414 Mar 31  2015 owltools*
-rw-r--r--  1 root  admin  34595002 Mar 31  2015 owltools-runner-all.jar
-rwxr-xr-x  1 root  admin      3212 Mar 31  2015 phenolog-runner*
-rwxr-xr-x  1 root  admin        10 Mar 31  2015 phenolog-runner.vmoptions*
-rwxr-xr-x  1 root  admin       307 Mar 31  2015 reasoner-diff.sh*

But does that explain why the inf.gaf I get locally looks correct (or at least much closer to it), and it's the Jenkins version that's missing a lot of expected annotations?

ValWood commented 7 years ago

Still needs to be an issue.

For http://www.pombase.org/spombe/result/SPAC17G8.13c histone acetyltransferase activity (H3-K14 specific)

there is an F-P link to histone acetylation but we don't get this inference (I would filter this anyway because its redundant single-step process, but as yet I haven't)

ValWood commented 6 years ago

This issue still appears to exist?

membrane fusion graph

membrane fusion

ValWood commented 6 years ago

All SNAP receptor should be annotated to membrane fusion by F-P pipline?

ValWood commented 6 years ago

It would be great to get the file with the missing annotations back...these are the annotations we would find useful (been missing for 2 years now..)

ValWood commented 6 years ago

@cmungall could you assign this one to somebody.

We just had our group meeting and we reverted to the old location, but the annotations mentioned here are still missing. @mah11 seem them when she runs the F-P inference locally.

v

ValWood commented 6 years ago

This problem still exists and means that potentially we lose a lot of useful annotations.

I have only found one example so far, the SNAP receptor activity one above is still not generating the process terms expected from the existing F-P links.

membrane organization

I would expect everything annotated to "SNAP receptor" to be annotate to "membrane organization" and it isn't.

ValWood commented 6 years ago

Here is another example.

GO:0004129 - cytochrome-c oxidase activity has an FP link to "proton transmembrane transport"

missing annotation

but we are not getting any mappings to this:

oxidase

ValWood commented 6 years ago

Also I did check that this was not due to the annotation to transfer being IEA. I'm assuming all other codes are transferred ? (I would not have a problem with IEA transfer via this route, but maybe after the redundancy issues are dealt with?)

cmungall commented 6 years ago

@yy20716 and @dougli1sqrd will help. HyeongSik will investigate any issues with the inference and Eric will check if there are any upstream issues.

But first we need a statement of what the problem is, there are a lot of different gene products and terms in this ticket (I'm truly sorry it has been around so long)

To investigate an F->P bug where there is a missing inference, we need to know a single example of:

  1. gene/product
  2. Asserted MF term
  3. Expected inferred BP term

I'm trying to figure this out. The gene mentioned in this ticket is, but it does not seem to be annotated to SPAC17G8.13c at source:

curl -L http://geneontology.org/gene-associations/submission/gene_association.pombase.gz | gzip -dc | grep SPAC17G8.13c | grep GO:0005484

(empty)

cmungall commented 6 years ago

I need help finding an example of the problem. Other annotations to SNAP receptor activity already have an annotation to a more specific BP. E.g. http://amigo.geneontology.org/amigo/gene_product/PomBase:SPBC31E1.04

Until we have a specific example to debug, here are the action items:

  1. @yy20716 document BasicAnnotationPropagator internally such that behavior with filtering redundant predictions is clearly internally documented and we have a test case
  2. @yy20716 and @dougli1sqrd will edit the curator-facing documentation on the propagation rule https://github.com/geneontology/go-site/blob/master/metadata/rules/gorule-0000023.md, consulting with @ValWood and @vanaukenk on making sure this is precise and clear
ValWood commented 6 years ago

I don't see SPAC17G8.13c ?

But yes genes would be helpful! So SNAP receptor activity (GO:0005484) has the following annotation that do not aquire "membrane organization" bet1 SPAC23C4.13 SNARE Bet1 (predicted) ISS bos1 SPAP14E8.03 SNARE Bos1 (predicted) ISO fsv1 SPAC6F12.03c SNARE Fsv1 TAS gos1 SPAC4G8.10 SNARE Gos1 (predicted) ISS pep12 SPBC31E1.04 SNARE Pep12 TAS sec20 SPAC23A1.15c SNARE Sec20 (predicted) ISS sec22 SPBC2A9.08c SNARE Sec22 (predicted) sec9 ISS sed5 SPBC8D2.14c SNARE Sed5 (predicted) ISS sft1 SPAC31A2.13c SNARE Sft1 (predicted) ISO syb1 SPAC6G9.11 SNAP receptor, synaptobrevin family ISM tlg1 SPBC36B7.07 SNARE Tlg1 (predicted) ISS tlg2 SPAC823.05c SNARE Tlg2 (predicted) ISS ufe1 SPCC895.04c SNARE Ufe1 (predicted) ISS use1 SPAC17G6.07c SNARE Use1 (predicted) ISS vti1 SPBC3B9.10 SNARE Vti1 (predicted) ISO ykt6 SPBC13G1.11 SNARE Ykt6 (predicted) ISO

I see a pattern here....so possibly you no longer make inferred MF-BP annotations from non EXP evidence codes? That's where we need them!

ValWood commented 6 years ago

I see a pattern here....so possibly you no longer make inferred MF-BP annotations from non EXP evidence codes?

I can't be sure that this is the reason. Our 3 characterised SNAPs have a "membrane organization" annotation but this is not from an F-P link...

ValWood commented 6 years ago

All of our cytochrome C oxidase are predicted so none of these will have EXP evidence.

These were just possibilties for the drop in number. I don't know if they are the reason.

@mah11 could you provide examples from the comparison where you identified the number drop?

mah11 commented 6 years ago

@mah11 could you provide examples from the comparison where you identified the number drop?

That's exactly what I did in the original summary of this very ticket - that's where the SNAP receptor example comes from.

If you need an example with experimental evidence, there are plenty. One nice clear one is pub1 - it's annotated by IDA to GO:0061630 ! ubiquitin protein ligase activity, which has a path* to GO:0016567 ! protein ubiquitination. But the pombase-prediction.gaf does not have a GO:0016567 annotation for pub1. On the PomBase gene page we display an IC annotation to a less specific term, GO:0070647, because the GO:0016567 annotation is missing.

*GO:0061630 is_a GO:0004842, and GO:0004842 part_of GO:0016567

suzialeksander commented 5 months ago

@ValWood can you check if this is still an issue?

ValWood commented 5 months ago

It is still an issue.

These gene products to "SNAP receptor activity" should annotated to "membrane fusion" via F-P ontology links. https://www.pombase.org/results/from/id/06cc5a83-038d-4e0f-95b2-e35beaea8a79

The pipeline currently only makes inferences for EXP annotations, not for other evidence codes, and these are ISS.

(I have avoided annotating to keep as an example until this ticket is fixed, but most are now covered by PAINT, which is nice.)

pgaudet commented 5 months ago

I am not clear how F-> P links happen. There are several tickets about this