geneontology / pathways2GO

Code for converting between BioPAX pathways and Gene Ontology Causal Activity Models (GO-CAM)
8 stars 0 forks source link

Still say 'NO' to drugs #182

Closed ukemi closed 1 year ago

ukemi commented 2 years ago

Despite our best efforts to build walls, drugs are still creeping into the models. In pathway R-HSA-112311 (http://noctua.geneontology.org/editor/graph/gomodel:R-HSA-112311), the ACHEIs (R-ALL-9634838) represents a set of drugs. We need to figure out how to filter these before the next release.

nataled commented 2 years ago

I imagine it should be possible to make use of CHEBI's has_role hierarchy, searching for CHEBI:23888 "drug". If all components of a set (or even individuals) have this annotation, the appropriate filtering can occur.

ukemi commented 2 years ago

Thanks @nataled. These are all xref'd to the guide to pharmacology and we are supposed to filter on that. I think what happened here is we didn't traverse to the members of the set. The ChEBI roles get a bit problematic because they are very permissive. If a chemical can ever take on that role, it gets assigned. ATP has a drug role. https://www.ebi.ac.uk/chebi/chebiOntology.do?chebiId=CHEBI:15422&treeView=true#vizualisation

nataled commented 2 years ago

A violation of the 'all/some' principle! Yeah, that makes it non-useful.

deustp01 commented 2 years ago

I think what happened here is we didn't traverse to the members of the set.

Yes! Look, for example at reaction R-HSA-9634834 "ACHEIs bind ACHE". An input protein binds [a member of] an input set of small molecules to form a protein:small molecule complex. Digging into the annotation, all members of the set (R-ALL-9634838) have a drug attribute set (note the red color on the circle preceding the name of each set member in the list) and something in the Reactome web code recognizes that the set is composed of drugs and therefore assigns the set icon in the pathway diagram a distinctive red-purple color and Rx tag, BUT the set itself does not have any drug attribute. As I understand Ben's code and Dustin's patch, the GO-CAM converter would treat the set as the input entity, not look at its contents, and therefore have no way of realizing that the set is drug.

Screen Shot 2022-07-06 at 2 49 51 PM

Possible patches:

  1. Get the GO-CAM code to unpack sets to look for contraband.
  2. Add a boolean (?) attribute _isDrug to Reactome sets and get the GO-CAM code to use that attribute to decide whether a reaction involves drugs and should therefore be discarded.

Unless @dustine32 is totally certain that 1 is trivially easy and 2 is likely to cause problems, I suspect (subject to discussion with Guanming) that 2 is the way to go. We will probably find other uses for this attribute. For efficiency-of-curation reasons, almost all our annotations involve sets of drugs, and the total number of drugs and drug sets so far is small so the legacy cleanup will be fairly easy.

deustp01 commented 2 years ago

2. Add a boolean (?) attribute _isDrug to Reactome sets and get the GO-CAM code ...

Tangent for the future. This should be a general problem, that the GO-CAM converter examines the set, not its contents, so are there other situations where it would be useful for a set to have an attribute that specifies the kind of entities it contains?

ukemi commented 2 years ago

I imagine that we will need to unpack the sets to generate complete annotation files from the models. Off hand I would say the answer is yes as we start to incorporate the ontology structure of PRO. Are there cases of mixed sets? I am also still toying with the idea of using the has_small_molecule_regulator relation to bring drugs into the universe at some point. But yet another tangent.

dustine32 commented 2 years ago

@deustp01 @ukemi Thanks for doing the hard work already! Yes, currently the code does not dig down into set members for drug detection. I can probably just make the function recursively run on entities that contain any memberPhysicalEntity. It should at least catch our ACHEIs example here as its set members do contain "Guide to Pharmacology" mappings.

deustp01 commented 2 years ago

Are there cases of mixed sets?

Yes, but not many and there are really good arguments for breaking them up into multiple un-mixed sets, e.g., a set of proteins and small molecules released by vesicle exocytosis should be two sets, one of proteins and one of small molecules.

ukemi commented 1 year ago

@dustine32, as we think about the next release, where are we on this one. I think that we want to go ahead with the recursive rule. At the same time can we provide a report for @deustp01 so that he can break the sets up at Reactome as he describes above?

dustine32 commented 1 year ago

@ukemi Debugging this right now, actually! I thiiink I got the recursive route working but I'll confirm once I generate and look at the model for our test case ACHEIs in Neurotransmitter clearance. Should have confirmation and hopefully that report (just a list of Reactome IDs?) sometime tomorrow.

ukemi commented 1 year ago

Thanks @dustine32. Is there a place where @deustp01 and I can look at the staged models?

dustine32 commented 1 year ago

@deustp01 @ukemi Here's the mixed set report for Reactome human release 81: reactome_mixed_sets_81.txt There are 284 results.

The criteria is any PhysicalEntity having member entities of more than one type. Does this sound right?

dustine32 commented 1 year ago

@ukemi I can push the updated full load to noctua-dev for testing. We currently have Reactome 80 release GO-CAMs loaded but the 81 release BioPAX is available. Should I generate and push based off the 81 release file?

ukemi commented 1 year ago

Hmmmm. I have the next release as 82. @deustp01?

deustp01 commented 1 year ago

What's now publicly visible at www.reactome.org is version 81. Version 82 is scheduled for release next Wednesday, 9/14, or shortly after that.

ukemi commented 1 year ago

Thanks @deustp01. I thought that 81 was the latest one we loaded into GO-CAMs, but I guess I was mistaken. So should we just wait and go with 82? That's the one I have in the project tasks.

deustp01 commented 1 year ago

Go with 82 - 81 will be obsolete within the next two weeks.

deustp01 commented 1 year ago

Here's the mixed set report for Reactome human release 81:

I will take a look today. A priori, nothing in the Reactome data model keeps us from including different kinds of physical entities (small molecules, protein monomers, complexes) together in one set but we probably shouldn't do this, both as a matter of good biology and as a matter of general logical orderliness.

And except for the special case of annotating heteromeric complexes assembled by taking all possible combinations of a monomer from set A, a monomer from set B, ..., we should not nest sets within sets, which we have done in the first item on Dustin's list (R-ALL-9665127), sigh. (And even for those complexes, it would be better to create all the individual complexes A1:B1:C1, A1:B2:C1, ..., and then make a set of those - not so hard to do, ensures that incorrect combinations of subunits do not get auto-created, and simpler and more reliable to parse.)

dustine32 commented 1 year ago

@deustp01 @ukemi Great! Do we have the release 82 BioPAX Homo_sapiens.owl file available now? (I'll ask this in the 82 ticket)

deustp01 commented 1 year ago

Here's the mixed set report for Reactome human release 81:

Getting back to the original issue in this ticket, finding and filtering out Reactome reactions that involve drugs, Dustin's list has 284 instances of Reactome sets whose members consist of more than one kind of physical entity, e.g. single protein and complex (probably harmless for now) but also five cases in which a set includes both drugs (molecules that are not part of normal human physiology) and normal human entities as members. These need to be cleaned up at Reactome. There are also other odd sets, e.g., sets nested within sets, that look like they should be poisonous but also outside the original scope of this ticket.

deustp01 commented 1 year ago

Last sanity check: are drugs excluded from the version 82 GO-CAM build? If so, OK to close

ukemi commented 1 year ago

Checked two pathways- TNF signaling and Neurotransmitter clearance. There is now some disconnected stuff in the latter, but the drugs are gone. Closing this ticket.