Open thomaspd opened 1 year ago
Yes! We have treated case 2 as obviously true for a long time, but I'm not sure the code to enable it has ever been implemented in the GO-CAM conversion process. And when it is implemented, it should also generate a report of all the catalystActivity instances that it fixed in this way, to be fed back to Reactome to patch the Reactome annotations that are the single source of truth here. @dustine32 ?
@deustp01 Yes, we can log out the number 2 cases where the resulting "complex" only has a single has_part
GP.
@thomaspd If you can find an example complex with the active unit specified that would help. I'll keep looking too.
Also, asking @huaiyumi for any examples of active subunit annotation in Reactome that I can look for in the BioPAX.
There are 1784 catalystActivity instances in our central database whose activeUnit slot is not null. Let me figure out who to ask here to get you a table of the subset of these instances that has actually been released. We should be able to generate a table of the dbID of each instance, its physicalEntity (the complex), its activeUnit (the individual EWAS gene product), and the dbID and name of the reaction in which it occurs. Are there other attributes you'd want in the table?
@dustine32 but meanwhile, here is a short arbitrary list of catalystActivity instances whose physicalEntity is a heteromeric complex and whose activeUnit is a protein monomer, as a starting point to begin to explore the BioPAX to see what can be done on point 2, above, and what a useful format would be for bulk processing.
https://reactome.org/content/schema/instance/browser/1806156 https://reactome.org/content/schema/instance/browser/5676051 https://reactome.org/content/schema/instance/browser/6798176 https://reactome.org/content/schema/instance/browser/1806283 https://reactome.org/content/schema/instance/browser/8868073 https://reactome.org/content/schema/instance/browser/5358378 https://reactome.org/content/schema/instance/browser/109879 https://reactome.org/content/schema/instance/browser/9836928
Each URL points to a page that lists the names and dbIDs of the heteromeric complex, the protein monomer activeUnit, and the reaction that the caqtalystActivity mediates.
I can also make a list of samples of catalystActivity instances where the physicalEntity is a set of heteromeric complexes and the activeUnit is a set of monomers or a set of subcomplexes, also of cases where the heteromeric complex involves both protein and non-protein (RNA or DNA or small-molecule) subunits, if any of those are of interest.
I hope, from this test material, we can figure out what you need in a comprehensive list.
@dustine32 Does the catalyst activity here help? R-HSA-21271
@deustp01 @ukemi Thank you for these examples! I don't really need the full list of all activeUnits as these few helped me find where in the BioPAX I can expect to find them. An example for reaction R-HSA-1675883:
<bp:Catalysis rdf:ID="Catalysis1698">
<bp:controller rdf:resource="#Complex3671" />
<bp:controlled rdf:resource="#BiochemicalReaction3397" />
<bp:controlType rdf:datatype="http://www.w3.org/2001/XMLSchema#string">ACTIVATION</bp:controlType>
<bp:xref rdf:resource="#RelationshipXref3080" />
<bp:xref rdf:resource="#RelationshipXref3090" />
<bp:dataSource rdf:resource="#Provenance1" />
<bp:comment rdf:datatype="http://www.w3.org/2001/XMLSchema#string">activeUnit: #Protein9680</bp:comment>
</bp:Catalysis>
Here, the activeUnit
, which eventually points to PI4KB [Golgi membrane] (Homo sapiens) for complex ARF1/3:GTP:PI4KB, is embedded in a comment field. Not the greatest feeling about this placement but it'll definitely do for now!
is embedded in a comment field
If I understand what you're saying correctly, yes, if you look at the instancebrowser view for the EWAS PI4KB [Golgi membrane] (Homo sapiens) its role as the activeUnit of a complex involved in catalysis is shown as a comment. But if you come at the annotation from the other direction - start with the catalystActivity instance 1-phosphatidylinositol 4-kinase activity of ARF1/3:GTP:PI4KB [Golgi membrane] then its role is shown as an attribute. Or am I misunderstanding the problem?
Also, it makes sense to me to work starting from reactions that have associated catalystActivities, systematically looking at what the physicalEntity of each catalystActivity is, and if that physicalEntity is not at EWAS or set of EWASs, then proceed further to see if it fits case 2 above.
Also also, if a by-product of this survey were a list of catalystActivites where the physicalEntity is a complex or set of complexes but the activeUnit slot is null, that list would be the starting point for re-curation to fill the empty slots. And if in each case the components of the complex could be checked in the central GO annotation file to see if any have been assigned the same GO molecular function as Reactome has assigned to the whole complex, that would make the re-curation process at Reactome much faster and more reliable. @ukemi I know we talked about something like this with Ben Good; I don;t know how close he got to implementing it.
Or does this last part duplicate work you've already done to generate the tables described in #296 (which I haven't looked at yet)?
And if in each case the components of the complex could be checked in the central GO annotation file to see if any have been assigned the same GO molecular function as Reactome has assigned to the whole complex, that would make the re-curation process at Reactome much faster and more reliable. @ukemi I know we talked about something like this with Ben Good; I don;t know how close he got to implementing it.
That failed - in many cases the catalystActivity of the whole complex has been assigned to all of its protein components - but perhaps re-doing it with the cleaned-up fly set of complex component functions would yield good results.
Summarizing the discussion so far as a to-do list.
@deustp01 Yes, we can log out the number 2 cases where the resulting "complex" only has a single
has_part
GP.
@dustine32 @ukemi just to document the current state / need, here is an active unit in the first reaction of the "carnitine biosynthesis" pathway and in the derived GO-CAM. The physical entity is a complex involving one copy of one gene product and one copy each of a couple of chemical entities. Can the GO-CAM generation script be re-done to identify the gene product and make it the enabler?
Or if that is hard or dangerous, can we plan to generate a list (partial is OK to start) of the number 2 cases, that we can use to figure out how to bulk-edit the Reactome annotations to add the missing activeUnit annotations, so that the existing GO-CAM genertion script can use them? (A practical issue here is whether David and I, as we (re)curate pathways should add this information manually as part of our work, or leave it out because a script will soon be available to do it automatically?
@dustine32 I think this proposal is directly in line with what we had talked about on the call today. I think you have already done it for when there is a single GP as the enabler, but we should do it with the complexes too. For these cases, would it be possible to use the UniProt GCRP identifiers instead of a REACTO id? We need to start weeding away the Reacto identifiers.
Hi @dustine32. Note that part of the requirement in the second point of this post is to ignore small molecules.
If Reactome does not specify the catalytic subunit, the enabler of the reaction should be a GO protein-containing complex instance (GO:0032991) with has_part links to each protein subunit. Small molecule components of complexes should be ignored for now. So if a Reactome complex is composed of just one gene product and one or more small molecules, then it should be treated the same as case 1 above and connect the activity directly to the gene product without a protein-containing complex individual.
This is consistent with the discussion in #327.
For Reactome complexes that are controllers of enzymatic reactions, these should be aligned with GO-CAM to specify gene product IDs and not PRO IDs. To do this, we will need to handle two different cases: 1) If Reactome specifies the catalytic subunit, the enabler of the reaction should be the catalytic subunit and not the complex. The rest of the complex should be ignored for now. 2) If Reactome does not specify the catalytic subunit, the enabler of the reaction should be a GO protein-containing complex instance (GO:0032991) with has_part links to each protein subunit. Small molecule components of complexes should be ignored for now. So if a Reactome complex is composed of just one gene product and one or more small molecules, then it should be treated the same as case 1 above and connect the activity directly to the gene product without a protein-containing complex individual.