Closed balhoff closed 2 years ago
@vanaukenk @kltm do we need to update metadata on the instance nodes for these replacements? E.g. dc:date
.
@balhoff For whatever reason, my gut instinct would be "no", as this is built into the ontology and not an action by a curator. doing this would also open the door to needing an agent that did the update, as a curator was not involved. I feel like this is sliding towards talking about full history again. Moreover, at the end, the fact that this was done was not encoded by changing the date. Better might be a comment that the action happened. (Although that just takes us back to who did the comment and whether that changes the date.) That said, there may be a best or common practices around this already used by curators. I'd be interested in what @vanaukenk thought about this.
From discussion on 2022-03-01 MOD imports.
For changes such as replaced_by, we will:
1) Add a comment including the date and a description of the change. For example: "2022-03-01 GO:nnnnnnn replaced by GO:nnnnnnnn" (Note that GO:nnnnnnnn could be any ontology term).
2) Update the date on the assertion
We will not be updating the contributor.
Example Turtle diff result from implementation in #462:
<http://model.geneontology.org/5667fdd400000892/5667fdd400001755> <http://geneontology.org/lego/evidence> <http://model.geneontology.org/5667fdd400000892/5667fdd400001750> , <http://model.geneontology.org/5667fdd400000892/5667fdd400001751> , <http://model.geneontology.org/5667fdd400000892/5667fdd400001752> , <http://model.geneontology.org/5667fdd400000892/5667fdd400001753> ;
- a <http://www.w3.org/2002/07/owl#NamedIndividual> , <http://purl.obolibrary.org/obo/GO_0005913> ;
+ a <http://www.w3.org/2002/07/owl#NamedIndividual> , <http://purl.obolibrary.org/obo/GO_0005912> ;
+ <http://www.w3.org/2000/01/rdf-schema#comment> "Automated change 2022-03-02: <http://purl.obolibrary.org/obo/GO_0005913> replaced_by <http://purl.obolibrary.org/obo/GO_0005912>" ;
<http://purl.org/dc/elements/1.1/contributor> "GOC:pt"^^<http://www.w3.org/2001/XMLSchema#string> ;
- <http://purl.org/dc/elements/1.1/date> "2015-12-20"^^<http://www.w3.org/2001/XMLSchema#string> .
+ <http://purl.org/dc/elements/1.1/date> "2022-03-02"^^<http://www.w3.org/2001/XMLSchema#string> .
@balhoff We'll maybe be talking a little about this tomorrow; @cmungall will be making a quick spec for how the operations are expected to operate. (There will also be more work w/people trying to figure out what the first operations will be carried out with this tooling, an SOP for them, etc.)
From @kltm
From discussion today (2022-03-03), the recording format would be a TSV, in the noctua-models repo, that can be idempotently run.
In computing, an idempotent operation is one that has no additional effect if it is called more than once with the same input parameters.
Also, as a note from yesterday's software meeting, model-level date would be bumped as well individual-level. (It's worth keeping in mind that we may have to change this mechanism when we start implementing the modification and creation date separation.)
@kltm @vanaukenk for the obsolete class replacement, I am now updating the date for the model, and also am adding the change comments to the model. Do you agree with that? Here is a single model diff:
@@ -47,9 +47,13 @@
<http://purl.obolibrary.org/obo/GO_0045944> a <http://www.w3.org/2002/07/owl#Class> .
+<http://purl.obolibrary.org/obo/GO_0005667> a <http://www.w3.org/2002/07/owl#Class> .
+
<http://purl.obolibrary.org/obo/CL_0000084> <http://www.geneontology.org/formats/oboInOwl#id> "CL:0000084"^^<http://www.w3.org/2001/XMLSchema#string> ;
a <http://www.w3.org/2002/07/owl#Class> .
+<http://purl.obolibrary.org/obo/GO_0001228> a <http://www.w3.org/2002/07/owl#Class> .
+
<http://purl.obolibrary.org/obo/ECO_0000096> a <http://www.w3.org/2002/07/owl#Class> .
<http://purl.obolibrary.org/obo/GO_0044798> a <http://www.w3.org/2002/07/owl#Class> .
@@ -80,9 +84,10 @@
<https://w3id.org/biolink/vocab/in_taxon> <http://purl.obolibrary.org/obo/NCBITaxon_10090> , <http://purl.obolibrary.org/obo/NCBITaxon_9606> ;
<http://www.geneontology.org/formats/oboInOwl#id> "gomodel:539aab4300000001"^^<http://www.w3.org/2001/XMLSchema#string> ;
a <http://www.w3.org/2002/07/owl#Ontology> ;
+ <http://www.w3.org/2000/01/rdf-schema#comment> "Automated change 2022-03-11: <http://purl.obolibrary.org/obo/GO_0001077> replaced_by <http://purl.obolibrary.org/obo/GO_0001228>" , "Automated change 2022-03-11: <http://purl.obolibrary.org/obo/GO_0044798> replaced_by <http://purl.obolibrary.org/obo/GO_0005667>" ;
<http://purl.org/dc/elements/1.1/title> "Untitled by Unknown 01"^^<http://www.w3.org/2001/XMLSchema#string> ;
<http://purl.org/dc/elements/1.1/contributor> "GOC:kltm"^^<http://www.w3.org/2001/XMLSchema#string> ;
- <http://purl.org/dc/elements/1.1/date> "2015-08-10"^^<http://www.w3.org/2001/XMLSchema#string> .
+ <http://purl.org/dc/elements/1.1/date> "2022-03-11"^^<http://www.w3.org/2001/XMLSchema#string> .
<http://model.geneontology.org/539aab4300000001/539aab430000002> <http://geneontology.org/lego/evidence> <http://purl.obolibrary.org/obo/#539aab4300000001%2F5595c4cb00000297> ;
<http://geneontology.org/lego/hint/layout/x> "1200"^^<http://www.w3.org/2001/XMLSchema#string> ;
@@ -153,9 +158,10 @@
<http://purl.obolibrary.org/obo/BFO_0000050> <http://model.geneontology.org/539aab4300000001/539aab430000008> ;
<http://purl.obolibrary.org/obo/RO_0002333> <http://purl.obolibrary.org/obo/#539aab4300000001%2F5595c4cb00000274> ;
<http://purl.obolibrary.org/obo/RO_0002213> <http://purl.obolibrary.org/obo/#539aab4300000001%2F5595c4cb00000273> ;
- a <http://www.w3.org/2002/07/owl#NamedIndividual> , <http://purl.obolibrary.org/obo/GO_0001077> ;
+ a <http://www.w3.org/2002/07/owl#NamedIndividual> , <http://purl.obolibrary.org/obo/GO_0001228> ;
+ <http://www.w3.org/2000/01/rdf-schema#comment> "Automated change 2022-03-11: <http://purl.obolibrary.org/obo/GO_0001077> replaced_by <http://purl.obolibrary.org/obo/GO_0001228>" ;
<http://purl.org/dc/elements/1.1/contributor> "GOC:kltm"^^<http://www.w3.org/2001/XMLSchema#string> ;
- <http://purl.org/dc/elements/1.1/date> "2015-08-10"^^<http://www.w3.org/2001/XMLSchema#string> .
+ <http://purl.org/dc/elements/1.1/date> "2022-03-11"^^<http://www.w3.org/2001/XMLSchema#string> .
<http://purl.obolibrary.org/obo/#539aab4300000001%2F5595c4cb00000272> a <http://www.w3.org/2002/07/owl#NamedIndividual> , <http://purl.obolibrary.org/obo/ECO_0000150> ;
<http://purl.org/dc/elements/1.1/source> "PMID:8235597"^^<http://www.w3.org/2001/XMLSchema#string> .
@@ -190,8 +196,9 @@
<http://purl.org/dc/elements/1.1/contributor> "GOC:kltm"^^<http://www.w3.org/2001/XMLSchema#string> ;
<http://purl.org/dc/elements/1.1/date> "2015-08-10"^^<http://www.w3.org/2001/XMLSchema#string> .
-<http://purl.obolibrary.org/obo/#539aab4300000001%2F5595c4cb00000280> a <http://www.w3.org/2002/07/owl#NamedIndividual> , <http://purl.obolibrary.org/obo/GO_0044798> ;
- <http://purl.org/dc/elements/1.1/date> "2014-06-13"^^<http://www.w3.org/2001/XMLSchema#string> .
+<http://purl.obolibrary.org/obo/#539aab4300000001%2F5595c4cb00000280> a <http://www.w3.org/2002/07/owl#NamedIndividual> , <http://purl.obolibrary.org/obo/GO_0005667> ;
+ <http://www.w3.org/2000/01/rdf-schema#comment> "Automated change 2022-03-11: <http://purl.obolibrary.org/obo/GO_0044798> replaced_by <http://purl.obolibrary.org/obo/GO_0005667>" ;
+ <http://purl.org/dc/elements/1.1/date> "2022-03-11"^^<http://www.w3.org/2001/XMLSchema#string> .
<http://purl.obolibrary.org/obo/#539aab4300000001%2F5595c4cb00000281> <http://geneontology.org/lego/hint/layout/x> "1200"^^<http://www.w3.org/2001/XMLSchema#string> ;
<http://geneontology.org/lego/hint/layout/y> "75"^^<http://www.w3.org/2001/XMLSchema#string> ;
@@ -211,8 +218,9 @@
<http://purl.obolibrary.org/obo/#539aab4300000001%2F5595c4cb00000284> a <http://www.w3.org/2002/07/owl#NamedIndividual> , <http://purl.obolibrary.org/obo/ECO_0000096> ;
<http://purl.org/dc/elements/1.1/source> "PMID:8235597"^^<http://www.w3.org/2001/XMLSchema#string> .
-<http://purl.obolibrary.org/obo/#539aab4300000001%2F5595c4cb00000285> a <http://www.w3.org/2002/07/owl#NamedIndividual> , <http://purl.obolibrary.org/obo/GO_0044798> ;
- <http://purl.org/dc/elements/1.1/date> "2014-06-13"^^<http://www.w3.org/2001/XMLSchema#string> .
+<http://purl.obolibrary.org/obo/#539aab4300000001%2F5595c4cb00000285> a <http://www.w3.org/2002/07/owl#NamedIndividual> , <http://purl.obolibrary.org/obo/GO_0005667> ;
+ <http://www.w3.org/2000/01/rdf-schema#comment> "Automated change 2022-03-11: <http://purl.obolibrary.org/obo/GO_0044798> replaced_by <http://purl.obolibrary.org/obo/GO_0005667>" ;
+ <http://purl.org/dc/elements/1.1/date> "2022-03-11"^^<http://www.w3.org/2001/XMLSchema#string> .
<http://purl.obolibrary.org/obo/#539aab4300000001%2F5595c4cb00000286> <http://geneontology.org/lego/hint/layout/x> "750"^^<http://www.w3.org/2001/XMLSchema#string> ;
<http://geneontology.org/lego/hint/layout/y> "75"^^<http://www.w3.org/2001/XMLSchema#string> ;
@balhoff - I think it'd help if we went through this together. Let me know when you might have time. Thx.
@balhoff and I reviewed the diff. Looks good, but we decided to also delete the class declaration for the removed classes and we need to account for NOT annotations. See https://github.com/geneontology/minerva/pull/462
An implementation for this is merged into both master
and dev
, and needs testing.
What is the SOP (docs?) for this command? I might be able to play around with this on Thursday.
@kltm there is a new section in the INSTRUCTIONS: https://github.com/geneontology/minerva/blob/master/INSTRUCTIONS.md#migrate-obsolete-class-assertions-via-term_replaced_by
@balhoff Ah, sorry-missed that. I'll modify/add to that a little to complete a turnkey SOP. To clarify those files:
go-lego-reacto.owl
should be ~http://current.geneontology.org/ontology/extensions/go-lego-reacto.owl~ http://snapshot.geneontology.org/ontology/extensions/go-lego-reacto.owl ? Can I specify that as a URL, or do I need to download first?-j blazegraph.jnl
is the "annotation" journal, correct? Does anything need to be done with the "ontology" journal?
- Since we don't have our own pipeline in a box here, I'm guessing the best source for
go-lego-reacto.owl
should be http://current.geneontology.org/ontology/extensions/go-lego-reacto.owl ? Can I specify that as a URL, or do I need to download first?
This is the standard --ontology
option as other commands, so URL should be fine. You can provide a catalog as well.
- The
-j blazegraph.jnl
is the "annotation" journal, correct? Does anything need to be done with the "ontology" journal?
Yes, the annotation journal. Don't need an ontology/tbox journal. So you might:
--dump-owl-models
); commit.--replace-obsolete
on journal.--dump-owl-models
); take a look at diff; commit.@balhoff Okay, so a complete command series (we might be playing with this tomorrow) might look like:
java -jar minerva-cli.jar --replace-obsolete -j /tmp/blazegraph.jnl --ontology http://snapshot.geneontology.org/ontology/extensions/go-lego-reacto.owl
java -jar minerva-cli.jar -j /tmp/blazegraph.jnl --dump-owl-models -f ~/local/src/git/noctua-models/models
cd ~/local/src/git/noctua-models && git commit -a -m "replace obsolete terms with terms in latest snapshot ontology"
@kltm that looks good, if it is true that /tmp/blazegraph.jnl
is the journal which you already have all the models loaded in that you were using in the first step.
Applying to models in dev
, we had some stuff like:
2022-03-24 15:47:27,420 WARN (org.geneontology.minerva.cli.ReplaceObsoleteReferencesCommand:175) Unable to expand replaced_by value found in ontology into an IRI: "PCL:1000002"
2022-03-24 15:47:27,455 WARN (org.geneontology.minerva.cli.ReplaceObsoleteReferencesCommand:175) Unable to expand replaced_by value found in ontology into an IRI: "PCL:1000004"
2022-03-24 15:47:27,502 WARN (org.geneontology.minerva.cli.ReplaceObsoleteReferencesCommand:175) Unable to expand replaced_by value found in ontology into an IRI: "PCL:1000003"
Final message was:
2022-03-24 15:47:42,426 INFO (org.geneontology.minerva.cli.ReplaceObsoleteReferencesCommand:106) Successfully applied database updates to replace obsolete terms: 10938 changes
That's a lot, which is good. Only took a minute too. What happens if this fails? Does it roll back? Does it fail into a safe state (assuming that for now)?
@balhoff java -Xmx64G -jar ../minerva/minerva-cli/bin/minerva-cli.jar -j /tmp/blazegraph.jnl --dump-owl-models -f ~/local/src/git/noctua-models/models
gave me the error:
Parameter parse exception. Note that the first parameter must be one of: [--validate-go-cams, --dump-owl-models, --import-owl-models, --sparql-update, --owl-lego-to-json, --lego-to-gpad-sparql, --version, --update-gene-product-types]
Subsequent parameters are specific to each top level command.
Error message: Missing required option: [--dump-owl-models export OWL GO-CAM models from journal, --merge-ontologies Merge owl ontologies, --import-owl-models import OWL GO-CAM models into journal, --import-tbox-ontologies import OWL tbox ontologies into journal, --add-taxon-metadata add taxon associated with genes in each model as an annotation on the model, --clean-gocams remove import statements, add property declarations, remove json-model annotation, --sparql-update update the blazegraph journal with the given sparql statement, --replace-obsolete replace references to obsolete terms with their replaced_by values, --owl-lego-to-json Given a GO-CAM OWL file, make its minerva json represention, --lego-to-gpad-sparql Given a GO-CAM journal, export GPAD representation for all the go-cams, --version Print the version of the minerva stack used here. Extracts this from JAR file., --validate-go-cams Check a collection of go-cam files or a journal for valid semantics (owl) and structure (shex)]
Slight variation from my notes did seem to work though: java -Xmx64G -jar ../minerva/minerva-cli/bin/minerva-cli.jar --dump-owl-models -j /tmp/blazegraph.jnl -f ~/local/src/git/noctua-models/models/
Process (obviously without saving) has been applied to noctua-dev. Fingers crossed. Tagging @vanaukenk @balhoff
For the PCL terms above, searching with 'PCL:' in the 'Add Individual' box on the noctua-dev graph editor gives this:
And here on Ontobee:
It seems the PCL terms should have replaced obsolete CL terms, but maybe they didn't because they couldn't resolve to an IRI because we don't have PCL in GO-LEGO?
I get a 404 Error when searching for a PCL term on noctua-amigo.
If we have used any of those obsolete terms, then I suppose we need to add PCL to go-lego. But we also need to add the PCL prefix to the Minerva prefixes file.
Just tried this on master
:
2022-03-28 14:45:39,093 WARN (org.geneontology.minerva.cli.ReplaceObsoleteReferencesCommand:175) Unable to expand replaced_by value found in ontology into an IRI: "PCL:1000002"
2022-03-28 14:45:39,117 WARN (org.geneontology.minerva.cli.ReplaceObsoleteReferencesCommand:175) Unable to expand replaced_by value found in ontology into an IRI: "PCL:1000004"
2022-03-28 14:45:39,153 WARN (org.geneontology.minerva.cli.ReplaceObsoleteReferencesCommand:175) Unable to expand replaced_by value found in ontology into an IRI: "PCL:1000003"
2022-03-28 14:45:52,419 INFO (org.geneontology.minerva.cli.ReplaceObsoleteReferencesCommand:106) Successfully applied database updates to replace obsolete terms: 3590 changes
Summary: 823570 line changes in 15203 files modified.txt.gz modified.diff.gz
Tagging @vanaukenk
@kltm I think there is a lot of spurious diff in the output. I had that problem locally as well. It may be worth first dumping and committing (before running the replacement), then run the replacement. I got a better diff that way.
@vanaukenk I believe we are satisfied with this point? Please reopen/move if this is not done.
Probably implement as a new CLI command.