Closed ukemi closed 3 years ago
@ukemi Can you confirm that the correct (latest) version of you model is at: https://github.com/geneontology/noctua-models/blob/master/models/5ee8120100001244.ttl ? According to GH and the model metadata, no changes since 2020-07-06. (I just want to make sure we're at least starting from the issues is in minerva and not in saving, model push, etc.)
Hi @kltm,
Yes, this is the model.
@ukemi when you look at the model in noctua, are the annotations missing - e.g. http://noctua.geneontology.org/workbench/annpreview/?model_id=gomodel:5ee8120100001244 ? Indicating either pipeline or minerva gpad generation error. If they are, could you give one example that is not present in the gpad and should be ?
The annotations are there.
Could this also be related to #328? Also now questioning whether we really want to implement #269
Alka-Selzer moment I discovered today that between 6/17 and 6/18, we lost over 50% of our Noctua annotations: 6/17 NOCTUA Annotations: Total Number of Genes Annotated to: 1027 Total Number of Annotations: 6716
6/18 NOCTUA Annotations: Total Number of Genes Annotated to: 551 <<<<<<<<<< 476 loss Total Number of Annotations: 3111 <<<<<<<<<< 3605 loss
TL;DR: So, what we seem to have here is the model-state
getting dropped for some reason somewhere in the minerva steps (below); it seems to be there in GH: https://github.com/geneontology/noctua-models/blob/291a0a75bc7a890800da2e13f1953cea6a42aa21/models/5ee8120100001244.ttl#L16 and does not appear in the GPAD.
Any ideas @balhoff or @goodb ?
To spell out how to reproduce this:
noctua-model-id=gomodel:5ee8120100001244
wget http://snapshot.geneontology.org/products/annotations/noctua_mgi.gpad.gz
sjcarbon@moiraine:/tmp$:) zgrep -c "5ee8120100001244" noctua_mgi.gpad.gz
0
From @ukemi 's comment https://github.com/geneontology/minerva/issues/335#issuecomment-661816653 , we know that these have gotten at least into GH. This would seem to leave to error points: 1) pipeline mechanics (in feeding or handling) or 2) minerva error.
Grabbing the log from the last successful snapshot
, it's mentioned six times:
[2020-08-03T07:34:02.861Z] 2020-08-03 00:34:02,780 INFO (CommandLineInterface:442) Loading models/5ee8120100001244.ttl
[2020-08-03T07:55:43.643Z] 2020-08-03 00:55:43,550 INFO (BlazegraphMolecularModelManager:594) Load model abox: http://model.geneontology.org/5ee8120100001244 from database
[2020-08-03T08:01:07.825Z] + perl ./util/collate-gpads.pl [A LOT OF STUFF] legacy/gpad/5ee8120100001244.gpad [A LOT OF STUFF]
[2020-08-03T18:58:33.198Z] 2020-08-03 18:58:32,969 INFO org.renci.blazegraph.Load$ - Loading target/noctua-models/models/5ee8120100001244.ttl
[2020-08-03T19:01:36.437Z] http://model.geneontology.org/5ee8120100001244
[2020-08-03T19:02:20.871Z] 2020-08-03 19:02:20,841 INFO org.renci.blazegraph.Reason$ - 1253 changes in Some(http://model.geneontology.org/5ee8120100001244_inferred)
Poking around the stage logs a bit, this seems mechanically what I'd expect.
Trying to simulate locally:
git clone https://github.com/geneontology/noctua-models.git
mkdir models
mv noctua-models/models/5ee8120100001244.ttl ./models/
~/local/src/git/minerva/minerva-cli/bin/minerva-cli.sh --import-owl-models -f models -j blazegraph.jnl
mkdir -p legacy/gpad
~/local/src/git/minerva/minerva-cli/bin/minerva-cli.sh --lego-to-gpad-sparql --ontology http://skyhook.berkeleybop.org/snapshot/ontology/extensions/go-lego.owl -i blazegraph.jnl --gpad-output legacy/gpad
grep -c "5ee8120100001244" legacy/gpad/5ee8120100001244.gpad
14
Which, I believe, means that our annotation have gotten this far. The final step is:
perl noctua-models/util/collate-gpads.pl legacy/gpad/*.gpad
No production models for MGI
No production models for MGI
No production models for MGI
No production models for MGI
No production models for MGI
No production models for MGI
No production models for MGI
No production models for MGI
No production models for MGI
No production models for MGI
No production models for MGI
No production models for MGI
No production models for MGI
No production models for MGI
and there is no further output...which I think is a problem?
In the script the following seems to be triggered:
if (!grep {$_ eq 'model-state=production'} @props) {
Ah!
grep -c "state" legacy/gpad/5ee8120100001244.gpad
0
@kltm I think I have a solution and a cause. Want to run it by @balhoff but I suspect this will do it. I seem to have introduced this in an earlier quest to fix some other problem.
@kltm I think it would be straightforward to add a parameter to the minerva client that would apply the 'production-only' filter at the time the GPAD was generated. Do you want me to do that? Having that perl script that you discovered in the middle of the gpad assembly process for the pipeline seems maybe not so good from the standpoint of testing and stability. LMK.
Just wondering: we are still missing about 50% in the download. Any progress?
There is likely an incoming fix with #341 , pending review from @balhoff .
@hdrabkin @ukemi We should hopefully get some results from the new code on Friday.
In the release candidate we are missing several annotations coming from SynGO via Noctua: for example:
I think this is blocking for the Sept 2020 release.
We Did get ours back last week (we were missing 50%, mix of both SynGO and MGI
@kltm could the files we loaded be out of date ?
The current snapshot file appears to have 6916 lines attributed to SynGO. File header is date is 8/30/2020
Thanks @hdrabkin We did this data in the Sept release (release candidate has 4980 SynGO annotations). I will stop the release process.
@pgaudet I think that if this is an issue, it would be a new issue, not related to the "production" tag issue we had here. Are you looking at the output GPAD products from like noctua_mgi.gpad.gz?
Talking to @pgaudet earlier, this may just be an "echo" of this issue as it passes through various external pipelines that are on different schedules.
The mgi gpad file on snapshot appears to be missing annotations: http://snapshot.geneontology.org/products/annotations/noctua_mgi.gpad.gz
This model checks out but I can't find any of the annotations in snapshot from 7/19/2020. http://noctua.geneontology.org/editor/graph/gomodel:5ee8120100001244?model_id=gomodel:5ee8120100001244