geneontology / noctua

Graph-based modeling environment for biology, including prototype editor and services
http://noctua.geneontology.org/
BSD 3-Clause "New" or "Revised" License
36 stars 13 forks source link

GO-CAMs missing from the production triplestore #736

Closed lpalbou closed 3 years ago

lpalbou commented 3 years ago

This issue was noticed this morning, while testing the GO-CAM widget with the new Alliance release.

Some models used to be available and are not anymore, for instance models involving WB nsy-1 or sek-1. Other models are available, so it doesn't affect all models. @balhoff proposed it could be due to models not passing shex; 1) I am not aware we discard models in production that would violate shex, if yes I believe that's something new and would require a bit more discussion with @vanaukenk and @ukemi ; 2) we spotted at least one model with no shex violation and still not present in the production triple store

Useful examples

As part of training, @dustine32 and @tmushayahama , please try to do your own assessment to confirm or reject my hypothesis this is coming from the blazegraph in the triplestore. Some possible paths to follow: possibly a relation has changed IRI (maybe become https ?). Also for completeness, here is the SPARQL query used to list all the models with at least 3 connected activities. As a reminder, it's nearly the same as the one powering the GO-CAM API route to list models

balhoff commented 3 years ago

I think the issue is that the model state triple value changed in the missing model. The value is missing the xsd:string datatype, which it used to have:

<http://model.geneontology.org/568b0f9600000284> <http://geneontology.org/lego/modelstate> "production"

The query which deletes non-production models is sensitive to the datatype: https://github.com/geneontology/go-site/blob/master/pipeline/sparql/delete/delete_non_production.sparql

I'm not yet sure how it happened.

balhoff commented 3 years ago

Does anyone know why the datatype would have been stripped for only modelstate triples? https://github.com/geneontology/noctua-models/commit/09ccae6c1419d0fa07e668734e6db12fe7f42a54#diff-3f4e31b3bb066c3f6ee349deaceaea91fb905d9f2b0bce21b832930784154211L90-R90

lpalbou commented 3 years ago

Interesting, @vanaukenk told me she didn't modify the model state either.. and I am citing at least 2 models where this would have happened. Tag @tmushayahama and @kltm as they are the most likely to know if they change the way the model state is recorded from ART/NForm/NGraph

hattrill commented 3 years ago

Haven't touched my models since 23rd July, I checked these - I've yet to find a working one 60d5209a00000233 tumor necrosis factor-mediated signaling pathway via grnd-egr (D.mel) 60ad85f700001873 epidermal growth factor receptor signaling pathway via spi-Egfr (D.mel) 60ad85f700000309 Notch signaling pathway via N-Dl (D.mel) 60ad85f700000189 BMP signaling pathway via dpp-tkv/put (D.mel)

balhoff commented 3 years ago

I did some git bisect searches on the models repo. The first commit introducing modelstate "production" without xsd:string is this one adding ZFIN models: https://github.com/geneontology/noctua-models/commit/aea07dcfe7ab8a761821efeda203ee3a67b18054

However that's probably a red herring, since those are being created by external software (by @sierra-moxon?) I do expect that at the moment none of those ZFIN models are showing up in the release triple store, though.

The first commit that has non-ZFIN models missing xsd:string is this one: https://github.com/geneontology/noctua-models/commit/5dd18aa1fa9d5842bb99ae9d85270056681d8493

It's a normal Minerva auto-commit.

balhoff commented 3 years ago

There are some older models without xsd:string which have "development" status. But as far as I can tell those are all explained as being introduced via the pathways2go pipeline (different software from minerva) or @dustine32's gocamgen.

cmungall commented 3 years ago

While we should investigate the code that writes the strings and make sure they are consistently emitted as either xsd or plain liiterals, isn't it the case that the immediate fix here is to make this query

https://github.com/geneontology/go-site/blob/master/pipeline/sparql/delete/delete_non_production.sparql

(and indeed all queries) more defensive? Can we go ahead and do this?

balhoff commented 3 years ago

Done: https://github.com/geneontology/go-site/pull/1729

cmungall commented 3 years ago

oh, doh, the PR is linked clearly above too, somehow missed this!

I suggest

make sense?

lpalbou commented 3 years ago

There was a GO release but it seems the models are still missing from the production journal (try my SPARQL queries in first post). @balhoff

balhoff commented 3 years ago

There was a GO release but it seems the models are still missing from the production journal (try my SPARQL queries in first post). @balhoff

Okay, this is confusing. I will try to run the pipeline locally and figure out what is happening.

balhoff commented 3 years ago

@lpalbou I saw the same thing as you, but in the last few minutes, the triplestore started returning results for those models.

kltm commented 3 years ago

This may just be an artifact of the release getting finalized. It should be finalized and cleared now.

lpalbou commented 3 years ago

Checking again, yep now they are showing again ! Out of curiosity, @kltm is it like for GOLr where if i remember you had to manually deploy it after the release ?

I will double check later on the alliance that we have indeed retrieved the missing models, and if yes will close this ticket. Thanks all 🙂

lpalbou commented 3 years ago

Ok, everything seems back to normal. Thanks @balhoff for fixing and @hattrill for checking models. Closing this issue, but if anything else comes up, it would be nice to tag this ticket for context.