geneontology / pathways2GO

Code for converting between BioPAX pathways and Gene Ontology Causal Activity Models (GO-CAM)
8 stars 0 forks source link

Immediate Reload Reactome Pathways #160

Closed ukemi closed 2 years ago

ukemi commented 2 years ago

We need to Reload the Reactome Models because they are stale.

ukemi commented 2 years ago

Note this ticket as well: https://github.com/geneontology/neo/issues/91

ukemi commented 2 years ago
dustine32 commented 2 years ago

Thanks for making this ticket @ukemi! I plan on starting the one-time update of latest Reactome release into noctua-dev sometime this week.

deustp01 commented 2 years ago

Feature creep: we are going to need regular updates to keep the Reactome-derived GO-CAM set up to date with revisions in Reactome content and additions of new content so a really important side-product of this "one time" update will be ideas about how to do this updating efficiently and as nearly automatically as possible. @ukemi anything to add here?

ukemi commented 2 years ago

162 Looks like we are on the same page.

ukemi commented 2 years ago

@dustine32 I've made a column in the project for curator review. Once you have pushed the models to development, go ahead and move it there. Then @deustp01 and I will know that they are ready to check.

dustine32 commented 2 years ago

@ukemi @deustp01 I've regenerated models from the latest Reactome release 80. One unexpected thing is that running the pathways2go code as-is results in far fewer models than expected, 1885 out of the 2580 human pathways in Reactome. It turns out this is due to a "flag" ignore_diseases that "skips any pathway that has the word 'disease' in its name or any of its parent pathway's name" and is hard-coded set to true. Changing ignore_diseases to false produces the expected 2580 models.

So, should we be loading these "disease" pathways into Noctua or skipping them?

Noting that Noctua currently shows 1817 models with title containing "imported from: Reactome". So, leaning towards us skipping disease pathways.

deustp01 commented 2 years ago

Thanks!

Editorial / scope decision so far is, GO only considers normal biology so Reactome disease pathways are out of scope for GO-CAM conversion, at least for now. We checked the earlier build to be sure that Ben’s skip flag was working right, and the addition of ~70 new pathways since then seems about right, so the result is plausible. (There's also a flag to exclude anything tagged as a "drug", which may knock out a few more.)

ukemi commented 2 years ago

Is it time for us to start reviewing these? Shall we discuss a strategy on today's call?

deustp01 commented 2 years ago

Yes to both. One good place to start is to get a list by R-HSA-id of all pathways in the new build that were not in the previous one. Also from the Reactome side, a list of all pathways that have been revised since the previous build, to focus on the changed material. And a list of any that were in the previous one and dropped out (probably none - this is a sanity check). Beyond that, what to look for in the re-built pathways is an item for discussion today.

ukemi commented 2 years ago

@deustp01 and @ukemi should check the new cobalamin metabolism imports to sanity check the new biology.

ukemi commented 2 years ago

Actually, cobalamin might not go live until the new release. @deustp01 do you know some pathways that we worked on that were part of releases between the last load and the biopax that @dustine32 is picking up currently?

ukemi commented 2 years ago

@deustp01 and @dustine32 to aggressive to go for the June 9 maintenance window? If so, then June 23?

deustp01 commented 2 years ago

Cobalamin is already live, as are all the scattered edits to normal purine catabolism

dustine32 commented 2 years ago

Preliminary report of just R-HSA pathway models added or removed between the release loaded into Noctua prod and the newest Reactome release 80:

Added (70) R-HSA-201556 ALK new 2021-06-09 R-HSA-5620971 pyroptosis new 2021-03-23 R-HSA-8980692 (RHOA GTPase cycle) new 2021-03-23 checked 6/9 (pd) R-HSA-9012999 (RHO GTPase cycle) new - 2007 pathway revised 2020-07 and 2021-03 R-HSA-9013026 (RHOB GTPase cycle) new 2021-03-23 R-HSA-9013106 (RHOC GTPase cycle) new 2021-03-23 R-HSA-9013148 CDC42 new 2021-03-23 and possibly more recent edits checked 6/9 (pd) R-HSA-9013149 RAC1 new 2021-03-23 checked 6/9 (pd) R-HSA-9013404 RAC2 new 2021-03-23 checked 6/9 (pd) R-HSA-9013405 RHOD new 2021-03-23 checked 6/9 (pd) R-HSA-9013406 RHOQ new 2021-03-23 R-HSA-9013407 RHOH new 2021-03-23 R-HSA-9013408 RHOG new 2021-03-23 R-HSA-9013409 RHOJ new 2021-03-23 R-HSA-9013418 RHOBTB2 new 2020-12-08 and possibly more recent edits R-HSA-9013419 RHOT2 new 2021-03-23 R-HSA-9013420 RHOU new 2021-03-23 R-HSA-9013422 RHOBTB1 new 2020-12-08 and possibly more recent edits R-HSA-9013423 RAC3 new 2021-03-23 R-HSA-9013424 RHOV new 2021-03-23 R-HSA-9013425 RHOT1 new 2021-03-23 R-HSA-9035034 RHOF new 2021-03-23 R-HSA-9634597 GPER1 new 2021-12-08 checked 6/10 (pd) R-HSA-9659379 hearing1 new 2021-03-23 checked 6/10 (pd) only grouping R-HSA-9662360 hearing2 new 2021-03-23 checked 6/10 (pd) R-HSA-9662361 hearing3 new 2021-03-23 checked 6/10 (pd) R-HSA-9667769 hearing4 new 2021-03-23 checked 6/10 (pd) R-HSA-9674555 CSF3 new 2021-03-23 R-HSA-9686347 RIPK1 new 2020-09-15 is_DISEASE! * irregular annotation - manually skip now; fix in next Reactome release R-HSA-9690406 devel new 2020-12-08 R-HSA-9696264 RND3 new 2021-03-23 R-HSA-9696270 RND2 new 2021-03-23 R-HSA-9696273 RND1 new 2021-03-23 R-HSA-9701898 STAT3 new 2021-06-09 R-HSA-9705462 CSF3 regn new 2021-03-23 R-HSA-9706019 RHOBTB3 new 2020-12-08 R-HSA-9706369 FLT3 regn new 2020-12-08 R-HSA-9706374 FLT3 new 2020-12-08 R-HSA-9706574 RHOBTB new 2020-12-08 R-HSA-9707564 HMOX1 ftn new 2021-03-23 R-HSA-9707587 HMOX1 regn new 2021-03-23 R-HSA-9707616 Heme signl new 2021-03-23 R-HSA-9708296 tRNA catab new 2021-03-23 R-HSA-9708530 BACH1 regn new 2021-03-23 R-HSA-9709957 sensory new 2021-03-23 R-HSA-9711097 starve new 2021-03-23 checked 6/10 (pd) only grouping R-HSA-9711123 chem stress new 2021-03-23 R-HSA-9715370 Miro new 2021-03-23 R-HSA-9716542 GTPase new 2021-03-23 checked 6/10 (pd) only grouping R-HSA-9717189 taste1 new 2021-09-15 R-HSA-9717207 taste2 new 2021-09-15 R-HSA-9729555 taste3 new 2021-09-15 R-HSA-9729902 DNA damage1 new 2021-09-15 R-HSA-9730628 taste4 new 2021-09-15 R-HSA-9730737 DNA damage2 new 2021-09-15 R-HSA-9734091 MET regn new 2021-09-15 DRUG-disease or xenobiotic-normal? checked 6/9 (pd) R-HSA-9748784 Drug ADME new 2021-12-08 DRUG-disease or xenobiotic-normal? checked pd 6/8 only grouping R-HSA-9748787 Azathioprine ADME new 2021-12-08 DRUG-disease or xenobiotic-normal? R-HSA-9749641 Aspirin ADME new 2021-12-08 DRUG-disease or xenobiotic-normal? R-HSA-9750126 DNA damage new 2021-09-15 R-HSA-9752946 smell1 new 2021-12-08 R-HSA-9753281 Paracetamol ADME new 2021-12-08 DRUG? R-HSA-9754119 CDK4/6 inhib new 2021-12-08 DRUG? R-HSA-9754706 Atorvastatin ADME new 2021-12-08 DRUG? R-HSA-9755511 KEAP new 2022-03-23 ** includes a disease reaction - ignore for now - checked 6/10 (pd) R-HSA-9758881 cobalamin uptake new 2022-03-23 checked 6/9 (dph pd) R-HSA-9758890 cobalamin transport new 2022-03-23 checked 6/9 (dph pd) R-HSA-9759194 NFE2L2-1 new 2022-03-23 R-HSA-9759218 cobalamin metabolism new 2022-03-23 R-HSA-9762114 NFE2L2-2 new 2022-03-23

** These are all pathways that describe drug metabolism, so do we exclude them - "no to drugs" - or keep them - "yes to xenobiotic metabolism"?

*** This pathway has a non-NULL disease attribute.

Removed (2) R-HSA-194840 (Rho GTPase cycle). Pathway changed to R-HSA-9012999 and new children. YES, exactly R-HSA-68827 (CDT1 association with the CDC6:ORC:origin complex). Pathway changed to R-HSA-68689? NO - replaced by R-HSA-9749351. Is there a GO-CAM for this new pathway released 2021-12-08?

Will follow up with the ShEx results for the whole load (and will confirm they all pass OWL logical consistency checks).

ukemi commented 2 years ago

@deustp01 and @dustine32 , does the division of labor and the scheduling in the first comment of this ticket seem doable? Otherwise we should reschedule the release of the new load for the 23rd.

ukemi commented 2 years ago

Glycolysis sanity check----NOT VALID The import still looks consistent and true to the data from Reactome. The model does not include the new has_small_molecule_regulator relation and does display many reactions that impinge upon canonical glycolysis but are not a part of it. This opens up some nice discussions about how to display interacting pathways, but for the purpose of this import checking, this model passes the validity test. @deustp01 let's take a look at R-HSA-5696021. I think we can make it so it is not an 'island' in the model.

ukemi commented 2 years ago

@dustine32 When I search on noctua-dev, the Reactome models still have a modification date of 6/17/2021. Did the models in fact load? OK, weird. It looks like the model has the right date in the metadata, but it isn't updated on the landing page.

state development
contributor https://orcid.org/0000-0001-7476-6306
contributor https://reactome.org/content/detail/R-HSA-74259
date **2022-06-07**
title Purine catabolism - imported from: Reactome
comment For logical inference, import the integrated tbox ontology http://purl.obolibrary.org/obo/go/extensions/go-lego-reacto.owl
comment http://www.reactome.org
ukemi commented 2 years ago

purine catabolism sanity check ---- NOT VALID The import still looks consistent and true to the data from Reactome. The model does not include the new has_small_molecule_regulator relation. There are a lot of singleton Reactions that perhaps are due to missing preceding reaction links in Reactome? eg. R-HSA-109470 precedes R-HSA-74249 and R-HSA-74242? Is this valid for a subset of the outputs? See relation of R-HSA-74248 to R-HSA-74249 and R-HSA-74242. @deustp01?

dustine32 commented 2 years ago

@ukemi Sorry, the models have been generated but not yet loaded into noctua-dev. I made the PR for the new models late yesterday and I'll see if @kltm is able to merge and restart dev sometime today. For now I've unchecked the "loaded into noctua-dev" checkbox above.

ukemi commented 2 years ago

Ah, that explains a lot! Sorry about the confusion. I've pushed out the release date to the 23rd so we are not in such a crunch. Thanks @dustine32!

dustine32 commented 2 years ago

@ukemi Thank you! Yes, the 23rd should definitely be enough time for testing and fixing any discovered major issues.

deustp01 commented 2 years ago

Preliminary report of just R-HSA pathway models added or removed between the release loaded into Noctua prod and the newest Reactome release 80:

Checked all 70 new pathways in our internal database. All were publicly released between 2020-09-15 and 2022-03-23, which I think is since the closing date for the previous set of models. The two pathways flagged by Dustin as removed were indeed deleted and replaced by new pathways: note that in one case Dustin identified the replacements and in the other case I can't tell whether a GO-CAM has been generated from the replacement pathway.

"Say no to drugs" - it looks like seven of the new pathways involve drugs. Can we manually exclude these models now and then try to improve our drug testing program to exclude them automatically in the future?

dustine32 commented 2 years ago

@deustp01 Sure, I can manually exclude those models marked "DRUG" before they get into Noctua production.

As far as automating this, there's another flag hard-coded to true specifically to remove drug reactions: https://github.com/geneontology/pathways2GO/blob/9ac81db73cfff952739b6701203132a05837fb80/exchange/src/main/java/org/geneontology/gocam/exchange/BioPaxtoGO.java#L137 Without debugging to confirm, it looks like this would only remove the reactions and not the entire pathway. Should this switch to completely skip these pathways?

Possibly related, we found a bug in the reacto build a while back where all IUPHAR mappings got dropped. This could be affecting detection of drug entities in the translation. Luckily, we have a unit test we can run to see if this is still happening.

dustine32 commented 2 years ago

@ukemi @deustp01 Attached here is the ShEx validation report for all 1885 Reactome models in this load. Bad news: 1462 failed ShEx (but 423 passed). All models were OWL consistent. We can dig in to some of the failing models to search for patterns. shex_report_20220607.txt

Hopefully, the new models will be available in noctua-dev for testing by tomorrow (2022-06-08) as the dev server is cycling right now.

ukemi commented 2 years ago

Looking at the current models on the production server, it looks like these fail the Shex too. Is this good news or not? Having a quick look at chondroitin sulfate biosynthesis, it looks like we are violating something about location of functions. We have the function (GO:0015018) occurring in the golgi membrane (GO:0000139). This seems fine from a rule perspective although there is still the weirdness about membrane locations and where they catalyze their reactions (a discussion for another day). I just looked at some of my models that I created based on the Reactome imports and although I see some Shex violations, most that are similar to this place where something is occurring issue seem fine. Is there some reason that in the imported model we are not inferring that the golgi membrane or as in the case below, the lysosomal lumen is an anatomical structure? In the meantime, since this seems to be a problem with the current models, do we go ahead with this and say we are no worse off? Here is an example of the ShEX report from R-HSA-1793217 on production: [ { "shape": "obo:go/shapes/MolecularFunction", "constraints": [ { "object": "gomodel:reaction_R-HSA-1793217_location_lociGO_0043202", "property": "BFO:0000066", "node_types": [ "GO:0004565" BETA-GALACTOSIDASE ACTIVITY ], "object_types": [ "GO:0043202" LYSOSOMAL LUMEN ], "nobjects": 0, "matched_range_shapes": [

        ],
        "intended-range-shapes": [
           "obo:go/shapes/AnatomicalEntity"
        ]
     }
  ]

} ]

deustp01 commented 2 years ago

@deustp01 Sure, I can manually exclude those models marked "DRUG" before they get into Noctua production.

As far as automating this, there's another flag hard-coded to true specifically to remove drug reactions:

Thinking more, I realize there is a scope issue to discuss.

Without debugging to confirm, it looks like this would only remove the reactions and not the entire pathway. Should this > switch to completely skip these pathways?

No - we want this as-is. Ben's existing flag removes individual reactions in which a drug interacts with a normal physical entity and changes the entity's behavior, typically inactivating it. As Reactome includes these reactions showing effects of drugs as part_of the pathways that the normal entities normally participate in, this flag has the effect of stripping the "drug" reactions out of a normal pathway, leaving only the normal parts to go into the GO-CAM model.

In the "ADME" cases, we are actually annotating all the other steps in a drug's interaction with the body: administration (uptake), distribution, metabolism, and elimination. So the scope / strategy question is whether ADME pathways are legitimate examples of xenobiotic metabolism and thus within scope for GO. @ukemi @vanaukenk ?

deustp01 commented 2 years ago

Preliminary report of just R-HSA pathway models added or removed between the release loaded into Noctua prod and the newest Reactome release 80:

More checking: I looked in our internal database for all pathways whose _doRelease toggle is "true", whose species is Homo sapiens, whose disease attribute is NULL, and whose release date is anywhere in the range 2020-09-15 to 2022-03-23, and found a total of 68 versus 70 in Dustin's list. One of the pathways on the list but not found in my search is R-HSA-9686347 RIPK1 new 2020-09-15. I noticed it because I was expecting to retrieve at least one pathway with a 2020-09-15 release date from my earlier survey, but retrieved none. This pathway has a non-NULL disease attribute, so we need to figure out why Dustin used it, but that is a separate issue. That makes 69.

The 70th pathway retrieved by Dustin but not on my list of all pathways released on or after 2020-09-15 is

In our internal data base this pathway has a release date of 2007-05-15. This early release date is inconsistent with the pathway's identifier - R-HSA-9012999 - because these are assigned as ascending consecutive integers and in 2007 we were still in the hundred-thousands, not the millions. I expect that what happened here is that the curator, to try to show that the new collection of GTPase pathways descends from / expands on the original limited 2007 pathway, used its release date on the new pathway. That is legal - these are entered manually, and we don't have any rule prohibiting use of an old date to make a new pathway look like an old one. We may want to change this at Reactome.

Meanwhile, the tally is now consistent: the number of new-since-2020-09-15 is the same in Dustin's build and in our internal database.

deustp01 commented 2 years ago

Looking at the current models on the production server, it looks like these fail the Shex too.

I took Dustin's shex report shex_report_20220607.txt, put it into an Excel workbook and in the second sheet of the workbook, sorted the sheet by model_url and then by shex_valid. The first parameter sorts the models ASCIbetically by R-HSA-number (so R-HSA-1 before -10 before -2 before -20 etc) which will put models from old pathways mostly at the top and new ones at the bottom, as these almost all have numbers greater than 9000000. Here's the file, in case it's useful: shex_report_20220607.xlsx

ukemi commented 2 years ago

LOL, It must be useful because I did the same sort to my local copy. I'm still not seeing the new models on dev and I looked at the ShEX violations in the production models with @vanaukenk this morning. We are a bit stumped, but conclude they are on the GOC side and are not issues with the biology in or coming from Reactome.

dustine32 commented 2 years ago

Yep, sorry @ukemi @deustp01 for the delay! The new Reactome load is now in noctua-dev (at least as of 1pm Pacific today).

kltm commented 2 years ago

The delay was on me--apologies for the confusion.

ukemi commented 2 years ago

No worries @kltm. Since we pushed the switch to production out to the next maintenance cycle, the pressure is off. @dustine32 do you want to look at the ShEx violations together at some point? I'll start QCing our favorite models this afternoon and indicate it in the task list above.

ukemi commented 2 years ago

https://github.com/geneontology/pathways2GO/issues/160#issuecomment-1149846392

My 2 cents:

  1. I think we should include these pathways where the drug is acted upon by the body, but maybe bring this to an ontology call because I'm not sure I am up to date with current thinking.
  2. Perhaps open a ticket that captures the drugs interacting and modifying activities with the new has_small_molecule_regulator framework since we have chosen to go that route?
ukemi commented 2 years ago

@dustine32 Investigate why R-HSA-9686347 RIPK1 new 2020-09-15 is_DISEASE! *** was allowed through in the load.

ukemi commented 2 years ago

glycolytic process (gomodel:R-HSA-70171)

ukemi commented 2 years ago

purine catabolism R-HSA-73817

deustp01 commented 2 years ago

Uptake of dietary cobalamins into enterocytes R-HSA-9758881 looks good.

Screen Shot 2022-06-09 at 12 31 33 PM Screen Shot 2022-06-09 at 12 31 56 PM Screen Shot 2022-06-09 at 12 32 17 PM

And "Transport of RCbl within the body" R-HSA-9758890 and "Cobalamin (Cbl) metabolism" R-HSA-9759218 lookmOK. The metabolism pathway may have some causal connections in it that were not annotated in Reactome but were inferred by the model-building tool. They all look playsible. FOR THE FUTURE: is there a potential tool here to infer causal connections missed in Reactome and pass this information back to Reactome for annotation?

dustine32 commented 2 years ago

@deustp01 @ukemi I looked into the disease pathway getting through and I think I've figured out what's going on. Current "is disease"-checking logic is to traverse up the pathwayComponentOf chain for a given pathway until it finds a parent/ancestor pathway with display name == "Disease". A working example for R-HSA-9664323 as seen in Reactome shows how this hierarchy works: image The check will find the "parent" pathway labeled "Disease" and use that to determine R-HSA-9664323 is a disease pathway and should be skipped.

Now, our current failing example pathway R-HSA-9686347 in Reactome: image As you can see, the hierarchy does not lead to a parent pathway labeled "Disease." But In both pathways here you can see that red + symbol indicating that the Reactome site knows these are disease pathways. We need to figure out what data/property of the pathway the Reactome site code uses for displaying this symbol and then port that logic over to the pathways2GO code.

@dustine32 We've tracked this one down - it's an editorial error in Reactome. We will fix for the next Reactome release. Meanwhile, is it possible to exclude this GO-CAM model from the set that gets reloaded once we're done checking?

ukemi commented 2 years ago

The metabolism pathway may have some causal connections in it that were not annotated in Reactome but were inferred by the model-building tool. They all look plausible. FOR THE FUTURE: is there a potential tool here to infer causal connections missed in Reactome and pass this information back to Reactome for annotation?

Let's take a look at these on Monday and see how they come about.

deustp01 commented 2 years ago

As you can see, the hierarchy does not lead to a parent pathway labeled "Disease." But In both pathways here you can see that red + symbol indicating that the Reactome site knows these are disease pathways. We need to figure out what data/property of the pathway the Reactome site code uses for displaying this symbol and then port that logic over to the pathways2GO code.

This is irregular curation on the Reactome side - all other "disease" reactions and pathways are is_a children of the high-level grouping pathway "Disease".

deustp01 commented 2 years ago

The model R-HSA-9755511 (from Reactome pathway KEAP, new 2022-03-23) includes a single disease reaction. My editorial opinion (but definitely subject to review! @dustine32 @ukemi @vanaukenk) is that allowing this otherwise OK-looking model to get uploaded with the one out-of-scope reaction in it is OK for now. My fallback opinion is that if the presence of the disease reaction is in fact fatal, we can manually exclude this one pathway GO-CAM from the upload.

The long-term fix, I suspect, is to add a test like the one already in place for drug reactions in otherwise normal pathways. (That test detects reactions that have physical entities with a non-NULL "drug" attribute as inputs and does not use them to build the GO-CAM for the Reactome pathway, so the GO-CAM simply has a gap in those places.) Here, could we test for reactions whose "disease" attribute is not NULL and omit them when building the GO-CAM, to again yield a GO-CAM with a gap.

My best guess is that "disease" reactions in otherwise normal pathways, like "drug" reactions, are rare and they mostly show up as branches that diverge from the main flow of the pathway, so the resulting gaps will not get in the way of visualizing / navigating / mining the GO-CAM representation of the pathway.

This PowerPoint file has screenshots of relevant bits of the Reactome pathway and of the GO-CAM built from it: disease_reaction_in_normal_pathway.pptx and links to the GO-CAM model and the Reactome page for the reaction.

ukemi commented 2 years ago

For R-HSA-9755511, I agree with everything above. At least in this case, removing the disease reaction would just knock out the one regulatory step where the product inhibits R-HSA-8932327. Is this the only one like this?

deustp01 commented 2 years ago

Is this the only one like this?

Only one I've found so far, manually inspecting models of Reactome pathways new since 2021. Discussion question: how much damage do reactions like this do to GO-CAMS - enough to justify a test to find them?

Meanwhile on the Reactome side we need to ask whether creating such disease reactions within otherwise normal pathways is ever justified, or whether we need some sort of disease pathway along the lines of "Bad things done to host processes by miscellaneous otherwise unannotated pathogen proteins". Despite the title, such a pathway is no more arbitrary than the catalogs we've made of all orphan GPCRs or transporters that don't have homes in pathways.

ukemi commented 2 years ago

It might justify finding them. This pathway has so many molecular events, it is hard for me to tease out. One bad thing that comes to mind immediately if we allow these is that if they are inferred to be part of the pathway, the viral (or pathogeneic) gene product will get annotated to the pathway.

ukemi commented 2 years ago

In this pathway, we have inferred the initial binding reaction as an event that is part of the pathway. Then we also have the subsequent event that involves the bound complex as a negatively regulates step.

deustp01 commented 2 years ago

In this pathway, we have inferred the initial binding reaction as an event that is part of the pathway. Then we also have the subsequent event that involves the bound complex as a negatively regulates step.

This disease reaction sequence is exactly analogous to the drug reaction sequences that Ben's process reliably finds and suppresses: normal protein + drug -> normal protein:drug complex provides_input_for normal protein:drug complex negatively_regulates event(s) enabled by the normal protein.

So I hope that Ben's drug test can be adapted to find and suppress these stray disease reactions.

Meanwhile we are taking a look at Reactome to try to tally the exact number of disease reactions now annotated as part_of normal pathways, so a question for GO-CAM developers is whether it would be less work to use such a list instead of developing a test.

ukemi commented 2 years ago

For the drugs, IIRC we filter on the entity type to be a drug. We will need something else here.

deustp01 commented 2 years ago

We will need something else here.

An attribute of the reactionlikeEvent class is "disease", which must be empty / null for normal reactions and must contain one or more Disease Ontology terms for disease reactions.

balhoff commented 2 years ago

I took a look at the models showing ShEx violations for 'occurs in' relations (e.g., http://noctua-dev.berkeleybop.org/editor/graph/gomodel:R-HSA-8853383). When you delete the location instance (like a 'cytosol') and add a new one, then it passes. The issue is that in the initial model, some of the instance nodes don't have all the required metadata:

###  http://model.geneontology.org/reaction_R-HSA-6799545_location_lociGO_0005829
<http://model.geneontology.org/reaction_R-HSA-6799545_location_lociGO_0005829> rdf:type owl:NamedIndividual ,
                                                                                        <http://purl.obolibrary.org/obo/GO_0005829> .

When you delete that one and add one using Noctua, it gets all the annotations needed to pass the ShEx:

###  http://model.geneontology.org/R-HSA-8853383/62a100df00001305
<http://model.geneontology.org/R-HSA-8853383/62a100df00001305> rdf:type owl:NamedIndividual ,
                                                                        <http://purl.obolibrary.org/obo/GO_0005829> ;
                                                               <http://purl.org/dc/elements/1.1/contributor> "https://orcid.org/0000-0002-8688-6599"^^xsd:string ;
                                                               <http://purl.org/dc/elements/1.1/date> "2022-06-13"^^xsd:string ;
                                                               <http://purl.org/pav/providedBy> "http://geneontology.org"^^xsd:string .

So there is an issue with the model generator code, and also some confusing error reporting from the ShEx engine.