geneontology / minerva

BSD 3-Clause "New" or "Revised" License
6 stars 8 forks source link

Migrate obsolete term usages via replaced_by #456

Closed balhoff closed 2 years ago

balhoff commented 2 years ago

Probably implement as a new CLI command.

balhoff commented 2 years ago

@vanaukenk @kltm do we need to update metadata on the instance nodes for these replacements? E.g. dc:date.

kltm commented 2 years ago

@balhoff For whatever reason, my gut instinct would be "no", as this is built into the ontology and not an action by a curator. doing this would also open the door to needing an agent that did the update, as a curator was not involved. I feel like this is sliding towards talking about full history again. Moreover, at the end, the fact that this was done was not encoded by changing the date. Better might be a comment that the action happened. (Although that just takes us back to who did the comment and whether that changes the date.) That said, there may be a best or common practices around this already used by curators. I'd be interested in what @vanaukenk thought about this.

vanaukenk commented 2 years ago

From discussion on 2022-03-01 MOD imports.

For changes such as replaced_by, we will:

1) Add a comment including the date and a description of the change. For example: "2022-03-01 GO:nnnnnnn replaced by GO:nnnnnnnn" (Note that GO:nnnnnnnn could be any ontology term).

2) Update the date on the assertion

We will not be updating the contributor.

balhoff commented 2 years ago

Example Turtle diff result from implementation in #462:

<http://model.geneontology.org/5667fdd400000892/5667fdd400001755> <http://geneontology.org/lego/evidence> <http://model.geneontology.org/5667fdd400000892/5667fdd400001750> , <http://model.geneontology.org/5667fdd400000892/5667fdd400001751> , <http://model.geneontology.org/5667fdd400000892/5667fdd400001752> , <http://model.geneontology.org/5667fdd400000892/5667fdd400001753> ;
-       a <http://www.w3.org/2002/07/owl#NamedIndividual> , <http://purl.obolibrary.org/obo/GO_0005913> ;
+       a <http://www.w3.org/2002/07/owl#NamedIndividual> , <http://purl.obolibrary.org/obo/GO_0005912> ;
+       <http://www.w3.org/2000/01/rdf-schema#comment> "Automated change 2022-03-02: <http://purl.obolibrary.org/obo/GO_0005913> replaced_by <http://purl.obolibrary.org/obo/GO_0005912>" ;
        <http://purl.org/dc/elements/1.1/contributor> "GOC:pt"^^<http://www.w3.org/2001/XMLSchema#string> ;
-       <http://purl.org/dc/elements/1.1/date> "2015-12-20"^^<http://www.w3.org/2001/XMLSchema#string> .
+       <http://purl.org/dc/elements/1.1/date> "2022-03-02"^^<http://www.w3.org/2001/XMLSchema#string> .
kltm commented 2 years ago

@balhoff We'll maybe be talking a little about this tomorrow; @cmungall will be making a quick spec for how the operations are expected to operate. (There will also be more work w/people trying to figure out what the first operations will be carried out with this tooling, an SOP for them, etc.)

vanaukenk commented 2 years ago

From @kltm

From discussion today (2022-03-03), the recording format would be a TSV, in the noctua-models repo, that can be idempotently run.

vanaukenk commented 2 years ago

In computing, an idempotent operation is one that has no additional effect if it is called more than once with the same input parameters.

kltm commented 2 years ago

Also, as a note from yesterday's software meeting, model-level date would be bumped as well individual-level. (It's worth keeping in mind that we may have to change this mechanism when we start implementing the modification and creation date separation.)

balhoff commented 2 years ago

@kltm @vanaukenk for the obsolete class replacement, I am now updating the date for the model, and also am adding the change comments to the model. Do you agree with that? Here is a single model diff:

@@ -47,9 +47,13 @@

 <http://purl.obolibrary.org/obo/GO_0045944> a <http://www.w3.org/2002/07/owl#Class> .

+<http://purl.obolibrary.org/obo/GO_0005667> a <http://www.w3.org/2002/07/owl#Class> .
+
 <http://purl.obolibrary.org/obo/CL_0000084> <http://www.geneontology.org/formats/oboInOwl#id> "CL:0000084"^^<http://www.w3.org/2001/XMLSchema#string> ;
        a <http://www.w3.org/2002/07/owl#Class> .

+<http://purl.obolibrary.org/obo/GO_0001228> a <http://www.w3.org/2002/07/owl#Class> .
+
 <http://purl.obolibrary.org/obo/ECO_0000096> a <http://www.w3.org/2002/07/owl#Class> .

 <http://purl.obolibrary.org/obo/GO_0044798> a <http://www.w3.org/2002/07/owl#Class> .
@@ -80,9 +84,10 @@
        <https://w3id.org/biolink/vocab/in_taxon> <http://purl.obolibrary.org/obo/NCBITaxon_10090> , <http://purl.obolibrary.org/obo/NCBITaxon_9606> ;
        <http://www.geneontology.org/formats/oboInOwl#id> "gomodel:539aab4300000001"^^<http://www.w3.org/2001/XMLSchema#string> ;
        a <http://www.w3.org/2002/07/owl#Ontology> ;
+       <http://www.w3.org/2000/01/rdf-schema#comment> "Automated change 2022-03-11: <http://purl.obolibrary.org/obo/GO_0001077> replaced_by <http://purl.obolibrary.org/obo/GO_0001228>" , "Automated change 2022-03-11: <http://purl.obolibrary.org/obo/GO_0044798> replaced_by <http://purl.obolibrary.org/obo/GO_0005667>" ;
        <http://purl.org/dc/elements/1.1/title> "Untitled by Unknown 01"^^<http://www.w3.org/2001/XMLSchema#string> ;
        <http://purl.org/dc/elements/1.1/contributor> "GOC:kltm"^^<http://www.w3.org/2001/XMLSchema#string> ;
-       <http://purl.org/dc/elements/1.1/date> "2015-08-10"^^<http://www.w3.org/2001/XMLSchema#string> .
+       <http://purl.org/dc/elements/1.1/date> "2022-03-11"^^<http://www.w3.org/2001/XMLSchema#string> .

 <http://model.geneontology.org/539aab4300000001/539aab430000002> <http://geneontology.org/lego/evidence> <http://purl.obolibrary.org/obo/#539aab4300000001%2F5595c4cb00000297> ;
        <http://geneontology.org/lego/hint/layout/x> "1200"^^<http://www.w3.org/2001/XMLSchema#string> ;
@@ -153,9 +158,10 @@
        <http://purl.obolibrary.org/obo/BFO_0000050> <http://model.geneontology.org/539aab4300000001/539aab430000008> ;
        <http://purl.obolibrary.org/obo/RO_0002333> <http://purl.obolibrary.org/obo/#539aab4300000001%2F5595c4cb00000274> ;
        <http://purl.obolibrary.org/obo/RO_0002213> <http://purl.obolibrary.org/obo/#539aab4300000001%2F5595c4cb00000273> ;
-       a <http://www.w3.org/2002/07/owl#NamedIndividual> , <http://purl.obolibrary.org/obo/GO_0001077> ;
+       a <http://www.w3.org/2002/07/owl#NamedIndividual> , <http://purl.obolibrary.org/obo/GO_0001228> ;
+       <http://www.w3.org/2000/01/rdf-schema#comment> "Automated change 2022-03-11: <http://purl.obolibrary.org/obo/GO_0001077> replaced_by <http://purl.obolibrary.org/obo/GO_0001228>" ;
        <http://purl.org/dc/elements/1.1/contributor> "GOC:kltm"^^<http://www.w3.org/2001/XMLSchema#string> ;
-       <http://purl.org/dc/elements/1.1/date> "2015-08-10"^^<http://www.w3.org/2001/XMLSchema#string> .
+       <http://purl.org/dc/elements/1.1/date> "2022-03-11"^^<http://www.w3.org/2001/XMLSchema#string> .

 <http://purl.obolibrary.org/obo/#539aab4300000001%2F5595c4cb00000272> a <http://www.w3.org/2002/07/owl#NamedIndividual> , <http://purl.obolibrary.org/obo/ECO_0000150> ;
        <http://purl.org/dc/elements/1.1/source> "PMID:8235597"^^<http://www.w3.org/2001/XMLSchema#string> .
@@ -190,8 +196,9 @@
        <http://purl.org/dc/elements/1.1/contributor> "GOC:kltm"^^<http://www.w3.org/2001/XMLSchema#string> ;
        <http://purl.org/dc/elements/1.1/date> "2015-08-10"^^<http://www.w3.org/2001/XMLSchema#string> .

-<http://purl.obolibrary.org/obo/#539aab4300000001%2F5595c4cb00000280> a <http://www.w3.org/2002/07/owl#NamedIndividual> , <http://purl.obolibrary.org/obo/GO_0044798> ;
-       <http://purl.org/dc/elements/1.1/date> "2014-06-13"^^<http://www.w3.org/2001/XMLSchema#string> .
+<http://purl.obolibrary.org/obo/#539aab4300000001%2F5595c4cb00000280> a <http://www.w3.org/2002/07/owl#NamedIndividual> , <http://purl.obolibrary.org/obo/GO_0005667> ;
+       <http://www.w3.org/2000/01/rdf-schema#comment> "Automated change 2022-03-11: <http://purl.obolibrary.org/obo/GO_0044798> replaced_by <http://purl.obolibrary.org/obo/GO_0005667>" ;
+       <http://purl.org/dc/elements/1.1/date> "2022-03-11"^^<http://www.w3.org/2001/XMLSchema#string> .

 <http://purl.obolibrary.org/obo/#539aab4300000001%2F5595c4cb00000281> <http://geneontology.org/lego/hint/layout/x> "1200"^^<http://www.w3.org/2001/XMLSchema#string> ;
        <http://geneontology.org/lego/hint/layout/y> "75"^^<http://www.w3.org/2001/XMLSchema#string> ;
@@ -211,8 +218,9 @@
 <http://purl.obolibrary.org/obo/#539aab4300000001%2F5595c4cb00000284> a <http://www.w3.org/2002/07/owl#NamedIndividual> , <http://purl.obolibrary.org/obo/ECO_0000096> ;
        <http://purl.org/dc/elements/1.1/source> "PMID:8235597"^^<http://www.w3.org/2001/XMLSchema#string> .

-<http://purl.obolibrary.org/obo/#539aab4300000001%2F5595c4cb00000285> a <http://www.w3.org/2002/07/owl#NamedIndividual> , <http://purl.obolibrary.org/obo/GO_0044798> ;
-       <http://purl.org/dc/elements/1.1/date> "2014-06-13"^^<http://www.w3.org/2001/XMLSchema#string> .
+<http://purl.obolibrary.org/obo/#539aab4300000001%2F5595c4cb00000285> a <http://www.w3.org/2002/07/owl#NamedIndividual> , <http://purl.obolibrary.org/obo/GO_0005667> ;
+       <http://www.w3.org/2000/01/rdf-schema#comment> "Automated change 2022-03-11: <http://purl.obolibrary.org/obo/GO_0044798> replaced_by <http://purl.obolibrary.org/obo/GO_0005667>" ;
+       <http://purl.org/dc/elements/1.1/date> "2022-03-11"^^<http://www.w3.org/2001/XMLSchema#string> .

 <http://purl.obolibrary.org/obo/#539aab4300000001%2F5595c4cb00000286> <http://geneontology.org/lego/hint/layout/x> "750"^^<http://www.w3.org/2001/XMLSchema#string> ;
        <http://geneontology.org/lego/hint/layout/y> "75"^^<http://www.w3.org/2001/XMLSchema#string> ;
vanaukenk commented 2 years ago

@balhoff - I think it'd help if we went through this together. Let me know when you might have time. Thx.

vanaukenk commented 2 years ago

@balhoff and I reviewed the diff. Looks good, but we decided to also delete the class declaration for the removed classes and we need to account for NOT annotations. See https://github.com/geneontology/minerva/pull/462

balhoff commented 2 years ago

An implementation for this is merged into both master and dev, and needs testing.

kltm commented 2 years ago

What is the SOP (docs?) for this command? I might be able to play around with this on Thursday.

balhoff commented 2 years ago

@kltm there is a new section in the INSTRUCTIONS: https://github.com/geneontology/minerva/blob/master/INSTRUCTIONS.md#migrate-obsolete-class-assertions-via-term_replaced_by

kltm commented 2 years ago

@balhoff Ah, sorry-missed that. I'll modify/add to that a little to complete a turnkey SOP. To clarify those files:

balhoff commented 2 years ago

This is the standard --ontology option as other commands, so URL should be fine. You can provide a catalog as well.

  • The -j blazegraph.jnl is the "annotation" journal, correct? Does anything need to be done with the "ontology" journal?

Yes, the annotation journal. Don't need an ontology/tbox journal. So you might:

  1. Dump the journal to git ttl repo (--dump-owl-models); commit.
  2. Run --replace-obsolete on journal.
  3. Dump the journal to git ttl repo (--dump-owl-models); take a look at diff; commit.
kltm commented 2 years ago

@balhoff Okay, so a complete command series (we might be playing with this tomorrow) might look like:

balhoff commented 2 years ago

@kltm that looks good, if it is true that /tmp/blazegraph.jnl is the journal which you already have all the models loaded in that you were using in the first step.

kltm commented 2 years ago

Applying to models in dev, we had some stuff like:

2022-03-24 15:47:27,420 WARN  (org.geneontology.minerva.cli.ReplaceObsoleteReferencesCommand:175) Unable to expand replaced_by value found in ontology into an IRI: "PCL:1000002"
2022-03-24 15:47:27,455 WARN  (org.geneontology.minerva.cli.ReplaceObsoleteReferencesCommand:175) Unable to expand replaced_by value found in ontology into an IRI: "PCL:1000004"
2022-03-24 15:47:27,502 WARN  (org.geneontology.minerva.cli.ReplaceObsoleteReferencesCommand:175) Unable to expand replaced_by value found in ontology into an IRI: "PCL:1000003"

Final message was:

2022-03-24 15:47:42,426 INFO  (org.geneontology.minerva.cli.ReplaceObsoleteReferencesCommand:106) Successfully applied database updates to replace obsolete terms: 10938 changes

That's a lot, which is good. Only took a minute too. What happens if this fails? Does it roll back? Does it fail into a safe state (assuming that for now)?

kltm commented 2 years ago

@balhoff java -Xmx64G -jar ../minerva/minerva-cli/bin/minerva-cli.jar -j /tmp/blazegraph.jnl --dump-owl-models -f ~/local/src/git/noctua-models/models gave me the error:

Parameter parse exception.  Note that the first parameter must be one of: [--validate-go-cams, --dump-owl-models, --import-owl-models, --sparql-update, --owl-lego-to-json, --lego-to-gpad-sparql, --version, --update-gene-product-types]
Subsequent parameters are specific to each top level command. 
Error message: Missing required option: [--dump-owl-models export OWL GO-CAM models from journal, --merge-ontologies Merge owl ontologies, --import-owl-models import OWL GO-CAM models into journal, --import-tbox-ontologies import OWL tbox ontologies into journal, --add-taxon-metadata add taxon associated with genes in each model as an annotation on the model, --clean-gocams remove import statements, add property declarations, remove json-model annotation, --sparql-update update the blazegraph journal with the given sparql statement, --replace-obsolete replace references to obsolete terms with their replaced_by values, --owl-lego-to-json Given a GO-CAM OWL file, make its minerva json represention, --lego-to-gpad-sparql Given a GO-CAM journal, export GPAD representation for all the go-cams, --version Print the version of the minerva stack used here.  Extracts this from JAR file., --validate-go-cams Check a collection of go-cam files or a journal for valid semantics (owl) and structure (shex)]
kltm commented 2 years ago

Slight variation from my notes did seem to work though: java -Xmx64G -jar ../minerva/minerva-cli/bin/minerva-cli.jar --dump-owl-models -j /tmp/blazegraph.jnl -f ~/local/src/git/noctua-models/models/

kltm commented 2 years ago

Process (obviously without saving) has been applied to noctua-dev. Fingers crossed. Tagging @vanaukenk @balhoff

vanaukenk commented 2 years ago

For the PCL terms above, searching with 'PCL:' in the 'Add Individual' box on the noctua-dev graph editor gives this:

image

And here on Ontobee:

image

It seems the PCL terms should have replaced obsolete CL terms, but maybe they didn't because they couldn't resolve to an IRI because we don't have PCL in GO-LEGO?

I get a 404 Error when searching for a PCL term on noctua-amigo.

balhoff commented 2 years ago

If we have used any of those obsolete terms, then I suppose we need to add PCL to go-lego. But we also need to add the PCL prefix to the Minerva prefixes file.

kltm commented 2 years ago

Just tried this on master:

2022-03-28 14:45:39,093 WARN  (org.geneontology.minerva.cli.ReplaceObsoleteReferencesCommand:175) Unable to expand replaced_by value found in ontology into an IRI: "PCL:1000002"
2022-03-28 14:45:39,117 WARN  (org.geneontology.minerva.cli.ReplaceObsoleteReferencesCommand:175) Unable to expand replaced_by value found in ontology into an IRI: "PCL:1000004"
2022-03-28 14:45:39,153 WARN  (org.geneontology.minerva.cli.ReplaceObsoleteReferencesCommand:175) Unable to expand replaced_by value found in ontology into an IRI: "PCL:1000003"
2022-03-28 14:45:52,419 INFO  (org.geneontology.minerva.cli.ReplaceObsoleteReferencesCommand:106) Successfully applied database updates to replace obsolete terms: 3590 changes

Summary: 823570 line changes in 15203 files modified.txt.gz modified.diff.gz

Tagging @vanaukenk

balhoff commented 2 years ago

@kltm I think there is a lot of spurious diff in the output. I had that problem locally as well. It may be worth first dumping and committing (before running the replacement), then run the replacement. I got a better diff that way.

kltm commented 2 years ago

@vanaukenk I believe we are satisfied with this point? Please reopen/move if this is not done.