geneontology / pipeline

Declarative pipeline for the Gene Ontology.
https://build.geneontology.org/job/geneontology/job/pipeline/
BSD 3-Clause "New" or "Revised" License
5 stars 5 forks source link

Ontology release and snaphot seems to be out of phase with current state of ontology #95

Closed krchristie closed 5 years ago

krchristie commented 5 years ago

Hi,

I am trying to add annotations to a new GO term that I committed two weeks ago (stanza from go-edit.obo included below)

The term was added on 4/26/19, but when I try to use it, either in the existing model that already contains the other annotations from the paper, or in a new one, either using the form or the graph editor (either as an individual or as a process term), it is not available in the autocomplete.

How long should I expect to wait for a new term to be available?

thanks,

-Karen

[Term]
id: GO:0120197
name: mucociliary clearance
namespace: biological_process
def: "The respiratory system process driven by motile cilia on epithelial cells of the respiratory tract by which mucus and associated inhaled particles a\
nd pathogens trapped within it are moved out of the airways." [GOC:krc, PMID:24119105, PMID:27864314]
synonym: "MCC" RELATED [PMID:24119105, PMID:27864314]
synonym: "MCT" RELATED [PMID:28289722]
synonym: "mucociliary transport" EXACT [PMID:28289722]
is_a: GO:0003016 ! respiratory system process
is_a: GO:0003351 ! epithelial cilium movement involved in extracellular fluid movement
created_by: kchris
creation_date: 2019-04-26T16:16:50Z
kltm commented 5 years ago

Started new NEO build: https://build.geneontology.org/job/geneontology/job/pipeline/job/issue-35-neo-test/10/

kltm commented 5 years ago

NEO rebuilt and deployed...but the term does not seem to be available. @balhoff Would you happen to know the status of this term?

krchristie commented 5 years ago

Any progress on this?

I have added some other new terms that are also not available in Noctua (pull request: https://github.com/geneontology/go-ontology/commit/173040dfc4f5d4a21117df885b5135225ce72d4c)

GO:0120205 - photoreceptor proximal connecting cilium (CC) creation_date: 2019-05-10T22:47:08Z

GO:0120206 - photoreceptor distal connecting cilium (CC) creation_date: 2019-05-10T22:54:05Z

For good measure, I checked a term that was added by someone else and though I can see it in the ontology when I am up to date with origin master, I cannot use the term in Noctua. Here's the term and the relevant pull request:

GO:0140330 - xenobiotic detoxification by transmembrane export across the cell outer membrane (BP) creation_date: 2019-05-03T10:35:57Z https://github.com/geneontology/go-ontology/commit/a371973f35e4dbfe549a857fdb45cbe00771bf33

balhoff commented 5 years ago

@kltm getting new GO terms requires a Minerva restart, correct?

kltm commented 5 years ago

@balhoff The restart was not the issue--it was restarted a week and a half ago with https://github.com/geneontology/noctua/issues/612#issuecomment-491466881, but the term mentioned was still not present in NEO.

I'll go with the model that there was some other issue upstream and try again today.

balhoff commented 5 years ago

I'm confused about looking in NEO; these are GO terms, right?

kltm commented 5 years ago

The current load of NEO can be browsed here: http://noctua-amigo.berkeleybop.org/amigo http://noctua-amigo.berkeleybop.org/amigo/search/ontology (make sure to remove the GO filter) It is a simple ontology of all entities that can be annotated to for GO-CAMs in Noctua, including about 100,000 non-GO items (GPs, etc.).

balhoff commented 5 years ago

I usually just call NEO the GPs, vs. go-lego which imports both NEO and GO. I think the issue is that 'mucociliary clearance' is not in the go-lego release, but it is in snapshot. However I do see it in the go-plus release. These should not be different. But besides that issue, should Noctua be loading go-lego snapshot rather than release?

kltm commented 5 years ago

@balhoff Yes, I think you're on the right track here. Taking a quick look through the NEO repo (https://github.com/geneontology/neo), there is a lot wired for "current"; it wouldn't surprise me if that was true for its ontology use as well and would well explain the issues we're having. @cmungall Would it be possible to get a quick audit/check to make sure that NEO is properly using snapshots? (For your other question, I'd think that's generally true for any annotation system.)

krchristie commented 5 years ago

How often do go-lego snapshots get produced? If it's less frequently than daily, I think that's a problem for a curation tool.

kltm commented 5 years ago

@krchristie It's daily. Currently, the NEO reload is semi-manual, with automation in the roadmap https://github.com/geneontology/pipeline/issues/35

kltm commented 5 years ago

@balhoff Okay, digging in a little with help from @cmungall . I believe the issue is that the go-lego my go-lego-based load is fixed on the release versions, rather than the snapshots. To solve this, I guess we'd either need to have better catalog control in a few places (there's a ticket...) or have a version of go-lego that used a snapshot instead. For a third way, would you have a good mental model of what would happen if I just added the added the snapshot GO into owltools? There might be bits slightly out of sync (e.g. obsoleted terms), but maybe not too bad as an interim workaround? https://github.com/geneontology/pipeline/blob/issue-35-neo-test/Jenkinsfile#L53

balhoff commented 5 years ago

@kltm you can load go-lego from http://purl.obolibrary.org/obo/go/snapshot/extensions/go-lego.owl. This has all imports merged in, so it will not load GO from release.

kltm commented 5 years ago

@balhoff Great, thank you--I've added as above and will test.

kltm commented 5 years ago

@balhoff Huh. I've done the run and deployed the product, but the term is still not available... http://noctua-amigo.berkeleybop.org/amigo/search/ontology Given what's being loaded as https://github.com/geneontology/pipeline/blob/issue-35-neo-test/Jenkinsfile#L53 , is it possible that they are clobbering each other out?

balhoff commented 5 years ago

@kltm it doesn't make sense to me. I downloaded http://purl.obolibrary.org/obo/go/snapshot/extensions/go-lego.owl just now and I see GO_0120197 in there.

kltm commented 5 years ago

Well, it seems that the changes went into the build as expected: https://build.geneontology.org/job/geneontology/job/pipeline/job/issue-35-neo-test/13/ https://github.com/geneontology/pipeline/commit/3748d6803832cb2cf29ec589add58a8abac58f7a That said, http://noctua-amigo.berkeleybop.org/amigo/load_details gives us a rather odd line: 2019-05-28 | 2019-05-29 | http://purl.obolibrary.org/obo/go/extensions/go-lego.owl which indicates that it still went back to the released version for some reason. The running command would be like:

java \
    -Xms$LOADER_MEM \
    -Xmx$LOADER_MEM \
    -DentityExpansionLimit=8172000 \
    -Djava.awt.headless=true \
    -jar /srv/amigo/java/lib/owltools-runner-all.jar  \
    $ONTOLOGIES \
    --log-info \
    --solr-config /srv/amigo/metadata/ont-config.yaml \
    --merge-support-ontologies \
    --merge-imports-closure \
    --remove-subset-entities upperlevel \
    --remove-disjoints \
    --silence-elk \
    --reasoner elk \
    --solr-taxon-subset-name amigo_grouping_subset \
    --solr-eco-subset-name go_groupings \
    --solr-url http://localhost:8080/solr/ \
    --solr-log /tmp/golr_timestamp.log \
    --solr-load-ontology \
    --solr-load-ontology-general  \
    --solr-optimize

The environment variable dies indeed seem to be getting through to at least the outer layers: [2019-05-28T21:41:34.298Z] GOLR_INPUT_ONTOLOGIES=http://purl.obolibrary.org/obo/go/snapshot/extensions/go-lego.owl http://purl.obolibrary.org/obo/eco.owl http://purl.obolibrary.org/obo/ncbitaxon/subsets/taxslim.owl http://purl.obolibrary.org/obo/cl/cl-basic.owl http://purl.obolibrary.org/obo/go/extensions/gorel.owl http://purl.obolibrary.org/obo/pato.owl http://purl.obolibrary.org/obo/po.owl http://purl.obolibrary.org/obo/chebi.owl http://purl.obolibrary.org/obo/uberon/basic.owl http://purl.obolibrary.org/obo/wbbt.owl http://purl.obolibrary.org/obo/go/extensions/go-modules-annotations.owl http://purl.obolibrary.org/obo/go/extensions/go-taxon-subsets.owl

Noting: ONTOLOGIES=${GOLR_INPUT_ONTOLOGIES:= \

balhoff commented 5 years ago

@kltm is that line printing the ontology IRI? If so, that is expected because the snapshot has the same ontology IRI as the release.

kltm commented 5 years ago

I went through and compared the availability of terms across different releases that I have access to. I'd note that the last release was 2019-05-09.

| term\release            | release | snapshot | neo |
|-------------------------+---------+----------+-----|
| GO:0120197 (2019-04-28) | N       | Y        | Y   |
| GO:0120205 (2019-05-12) | N       | N        | N   |
| GO:0120206 (2019-05-12) | N       | N        | N   |
| GO:0140330 (2019-05-09) | N       | Y        | Y   |

So @balhoff , the example of GO_0120197 may not be a good one for whatever reason. This is disturbing in a couple of ways. The first is that GO:0120197 apparently does not show up in the release, even though it was a few days old. Maybe that's normal? I don't have a mechanism for that. On the happy side, snapshot and neo seems to be in sync with at least the availability of terms. Next, whatever is wrong with the neo/go-lego load seems to be the same problem in snapshot, a problem we didn't know we had before.

Given that this is now possibly a general snapshot problem and not just go-lego/neo problem, let's try some ideas for what is going wrong:

@balhoff , I can start on the fourth one there, and maybe start trying the fifth if we see nothing. Would you mind trying a few things around the first three? The release is tomorrow, so if you have other terms that you'd like the keep track of while they propagate, it might be good to mark them here now.

kltm commented 5 years ago

Updated the table able with new number a couple of days after the 2019-06-01 release:

| term\release            | release | snapshot | neo |
|-------------------------+---------+----------+-----|
| GO:0120197 (2019-04-28) | Y       | Y        | Y   |
| GO:0120205 (2019-05-12) | N       | Y        | Y   |
| GO:0120206 (2019-05-12) | N       | Y        | Y   |
| GO:0140330 (2019-05-09) | Y       | Y        | Y   |

Okay, this is interesting and worrying. Without "proof", it looks like the old snapshot is finally in the new release, and what we expected to to be in snapshot all along has finally gotten there.

@balhoff If something like that is correct, it would seem that we are not understanding something about ontology propagation within our system. If you're free at some point, I'd like to talk this over with you. As another example, we can look at the obsolete of a term https://github.com/geneontology/go-ontology/issues/17214

| term\release                | release | snapshot | neo |
|-----------------------------+---------+----------+-----|
| GO:0005395 (OBS 2019-05-22) | N       | Y        | Y   |

Again, it seems like release is hung up on a previous state, possibly until the next release?

kltm commented 5 years ago

While we have technically "solved" this issue, I'm hijacking it for the general case.

kltm commented 5 years ago

Noting that https://github.com/geneontology/pipeline/issues/95#issuecomment-498459607 would also be consistent with an env/owltools issue--I'll start pulling that apart next.

cmungall commented 5 years ago

Nothing to do with owltools, as we don't use owltools for the merge

I have tracked to an issue with robot: https://github.com/ontodev/robot/issues/493

kltm commented 5 years ago

@cmungall To clarify, the theory was not that it was owltools merge, but that the docker environment was not correctly picking up the external variable and using fallbacks, which would them point them to the last release...or something. The debugging that started was making the environment more verbose about what it actually contained. After that I would be back to owltools itself and water sprites if Jim had not found anything at his end. It sounds like it has worked out though :)