geneontology / pathways2GO

Code for converting between BioPAX pathways and Gene Ontology Causal Activity Models (GO-CAM)
8 stars 0 forks source link

Reactome Release 82 #176

Closed ukemi closed 1 year ago

ukemi commented 2 years ago
ukemi commented 2 years ago

@deustp01 and @dustine32 we need to add dates to the task list above. All of the things under 5 should get turned into tickets. If we can do incremental testing, we can check them off as we go.

dustine32 commented 1 year ago

Do we have the release 82 BioPAX Homo_sapiens.owl file available now?

deustp01 commented 1 year ago

Do we have the release 82 BioPAX Homo_sapiens.owl file available now?

Not yet. Should be available next week, maybe as early as 9/12. Note that getting a draft version of a BioPAX to use to generate a draft set of GO-CAMs to be checked while the Reactome source material is still available for updating, as outlined in the bulleted list at the top of this ticket is still something for the future. We hope to have that available in time for use for a coordinated release of Reactome version 83 both as Reactome pathway pages and as GO-CAMs, but that is a work in progress and depends on some machine upgrades and scripting that are still to be done.

dustine32 commented 1 year ago

@deustp01 Thanks! I can just wait until that's available and then push the new models to noctua-dev for testing.

ukemi commented 1 year ago

Thanks guys. So Wednesday we will need to talk about a modification of our SOP for the new loads since the BioPax isn't available until the release.

deustp01 commented 1 year ago

new loads since the BioPax isn't available until the release.

In the future, maybe as soon as the next release, we should have the BioPAX in advance, as planned, so I hope we are looking at a delay in our plans, not a change.

Cris Mungall question on pathways2GO call about intersection of Java versions and generation of BioPAX files, and the possibility that he / Dustin may already have code to get around the current Reactome problem.

dustine32 commented 1 year ago

@ukemi @deustp01 ShEx and OWL consistency checks have been run. 5 models failed ShEx, all are logically consistent. Full main_report.txt is below along with explanations.txt for the failures: main_report.txt explanations.txt

You can find the five ShEx failing models in main_report.txt by sorting the shex_valid column. Here are model links for your convenience: http://noctua-dev.berkeleybop.org/editor/graph/gomodel:R-HSA-9670095 http://noctua-dev.berkeleybop.org/editor/graph/gomodel:R-HSA-9708296 http://noctua-dev.berkeleybop.org/editor/graph/gomodel:R-HSA-1474151 http://noctua-dev.berkeleybop.org/editor/graph/gomodel:R-HSA-4615885 http://noctua-dev.berkeleybop.org/editor/graph/gomodel:R-HSA-2514859

ukemi commented 1 year ago

I've gone through these and here is what I think. I should probably still tweak the models and not save to see if I am right. @dustine32 There are a couple cases below where I don't see why some Reacto entities aren't passing.

http://noctua-dev.berkeleybop.org/editor/graph/gomodel:R-HSA-9670095 Annotation to an obsolete term. This should be corrected? http://noctua-dev.berkeleybop.org/editor/graph/gomodel:R-HSA-9708296 Endoribonuclease activity input and output of REACTO_R-HSA-9708818 and REACTO_R-HSA-9708815 are not typed as chemical entities or complexes. Why? http://noctua-dev.berkeleybop.org/editor/graph/gomodel:R-HSA-1474151 oxidoreductase activity has input REACTO_R-HSA-9693721. This should be CHEBI:17804. Why isn't it? http://noctua-dev.berkeleybop.org/editor/graph/gomodel:R-HSA-4615885 RANBP2 SUMOylates CDCA8 (Borealin) and PIAS3 SUMOylates AURKB (Aurora-B). This is a black box reaction with two catalysts. I suspect this is what is causing the Shex failure. Both are mapped to SUMO transferase activity. MFs should only have one enabler. http://noctua-dev.berkeleybop.org/editor/graph/gomodel:R-HSA-2514859 glycine N-acyltransferase activity has an input of obo:go/shapes/ProteinContainingComplex, obo:go/shapes/InformationBiomacromolecule, but this reaction also has an input of Acyl-CoA. @vanaukenk, we might want to modify the Shex here and make the target proteins a primary input. These kinds of enzymes will have molecules other than the target protein/complex being modified.

dustine32 commented 1 year ago

Thank you @ukemi for testing and the feedback! Sorry for the delay in responding. This was quite the learning experience and my responses below went through several iterations of "ohhhh!" and then having to rewrite what's going on.

So, for our failing models:

ukemi commented 1 year ago

@dustine32 thanks for the follow up.

ukemi commented 1 year ago

@dustine32 I just met with @deustp01 and he reminded me of some things we had done with other models. He will comment here, but in the meantime don't work on my suggestion to use ChEBI just yet.

ukemi commented 1 year ago

@dustine32 and @deustp01 Having a night to think this over, I think we should adopt a strategy to use the ChEBI identifiers. Peter, after our conversation yesterday the realization struck me that when I use the model copy functionality all of the Reactome chemicals get copied over to my models. See http://noctua.geneontology.org/editor/graph/gomodel:633b013300001469? This means that I need to go in and clean up all my models where I've used model copy. Less than an ideal situation.

deustp01 commented 1 year ago

@dustine32 and @ukemi Still thinking here about the correct role for ChEBI identifiers in these models - needs more discussion.

On the five models that failed ShEX and OWL, it looks like all have straightforward, known errors in the Reactome annotation / BioPAX export that are easy enough to fix on the Reactome side so that, if such turned up at QA time in a future fast Reactome-to-GO-CAM export, we could make the fixes within the seven-day window that should be available. Here, I have fixed four already; the fifth requires strong-arming a curator. Detailed comments for each are interpolated into Dustin's comment on 10/14.

dustine32 commented 1 year ago

OK, great! Thanks @deustp01 @ukemi for summing up the actions. If we decide to switch to extracting CHEBI IDs we can work off of that new code change ticket.

ukemi commented 1 year ago

@deustp01, should we move forward? @dustine32, during a weeds call last week, we decided that we should go ahead and try to import the ChEBI identifiers that are xref'd in Reactome. The main reason for this is consistency. Right now if I copy a model to make the mouse version, the Reactome entities come into my model. If I then make the model production, the Reactome entities end up on the Alliance view. This wouldn't be terrible if we could link them back to Reactome from there, but the group decided that it would be better if they linked to ChEBI because the ChEBI entities are the ones that are used by curators when they make de novo models. I will open a new ticket for this task. Meanwhile, I think the next thing to do is to sanity check the models that are on dev? and determine whether or not we can push the new load to production. Is everyone good with that?

deustp01 commented 1 year ago

Is everyone good with that?

I'm good with both (replace Reactome IDs for localized small molecules with ChEBI ID coupled if possible to GO CC term) and doing sanity checks on latest models before push to production.

dustine32 commented 1 year ago

@ukemi Yep, good with both as well. Let me know when/if you are good with the models in noctua-dev and I can make a PR with the new models to noctua-models/master. This PR will then get merged/loaded to Noctua prod during the next Noctua maintenance outage (most likely 2022-11-10).

ukemi commented 1 year ago

Models to check:

ukemi commented 1 year ago

Notes for future work in release 83:

  1. For links between pathways, it would be nice to infer a more specific relation than causally upstream of. In most cases, these are functions that are immediately upstream of and maybe even a directly-provides_input_for as long as that is valid for functions between processes. Something to discuss.
  2. We should implement the has_small_molecule_regulator for the next release glycolysis is a good pathway to look at.
  3. Don't forget that we want to convert chemicals to ChEBI
  4. TNF is an unsatisfying pathway because it is all molecular events. However, we did successfully filter the drugs away. Is there any way we can resolve the Greek letters? @deustp01 I notice that sometimes they are spelled out and sometimes they are Greek. Is there a rule for Reactome curation?
ukemi commented 1 year ago

OK @dustine32, I checked the above four models as well as the ones that failed the Shex and didn't spot any glaring issues with the import. I made a couple of notes above just so that I can remember them. There are now only two tasks left in this ticket, both in your court:

  1. Go ahead with the release to production.
  2. Generate the stats for @deustp01. I think he needs them for accounting purposes. Much like pathway boundaries, I have included that in this ticket, but clearly the release can technically go through without it. So if it is a thorny task, we can go ahead and call this release done and move that ticket, but I think it should still be a priority to get those stats. @deustp01 please feel free to comment.
dustine32 commented 1 year ago

@ukemi Great! I just found code specifically for the Venn diagram numbers in garage/Manuscript.java that I can revive.

ukemi commented 1 year ago

@dustine32 I'm wondering where we stand on this release since @deustp01 might want to report on the status at tomorrow's PI meeting.

dustine32 commented 1 year ago

@ukemi @deustp01 I figured out the dang stats!!

First, here's the Venn diagram based on the latest GO-CAM load of Reactome 82 going into Noctua prod today: image Note this Venn diagram is created by a free web tool at https://bioinformatics.psb.ugent.be/webtools/Venn/.

The commands to generate the input for this diagram tool are now incorporated into the Reactome -> GO-CAM conversion pipeline (along with the ShEx checks).

deustp01 commented 1 year ago

Great! I will align with the published diagram.

ukemi commented 1 year ago

Release 82 has been moved to the 'Done" column. Onward to 83.