Closed IgorRodchenkov closed 6 years ago
Attn: @cannin @emekdemir @ozgunbabur @gbader @armish
<bp:Interaction rdf:ID="Interaction_3ef7debf0a3fc71a964bdd35d6011dc3">
<bp:displayName rdf:datatype = "http://www.w3.org/2001/XMLSchema#string">SubPathwayInteraction767</bp:displayName>
<bp:name rdf:datatype = "http://www.w3.org/2001/XMLSchema#string">SubPathwayReaction</bp:name>
<bp:name rdf:datatype = "http://www.w3.org/2001/XMLSchema#string">SubPathway767Reaction</bp:name>
<bp:comment rdf:datatype = "http://www.w3.org/2001/XMLSchema#string">REPLACED http://smpdb.ca/pathways/#SubPathwayInteractions/767</bp:comment>
<bp:dataSource rdf:resource="#smpdb" />
<bp:participant rdf:resource="#SmallMolecule_1f23eb7807566d005690e5eff016fd8b" />
</bp:Interaction>
- and that's also used in, e.g. -
<bp:PathwayStep rdf:ID="PathwayStep_744c82ea84ea65b2aad68080fe5d6ff4">
<bp:comment rdf:datatype = "http://www.w3.org/2001/XMLSchema#string">REPLACED http://smpdb.ca/pathways/#SubPathwayInteractionSteps/SubPathway767</bp:comment>
<bp:stepProcess rdf:resource="#Interaction_3ef7debf0a3fc71a964bdd35d6011dc3" />
</bp:PathwayStep>
and one (from original PW000149 data file) has 44 pathway components, which seems to be the "true" pathway definition. Each of those 49 weird (sub-)pathways, in fact used to have the same original URI "http://smpdb.ca/pathways/#SubPathways/767", which after merging in PC2 have become 49 different URIs (this is done for data consistency/integrity, - some providers are known to have same URIs attached to different and even different type biopax objects in different input files).
A similar issue (#205) we had with KEGG pathways; - solved by merging them based on presence of standard kegg pathway identifier (via UnificationXref). There are also standard (MIRIAM) stable pathway IDs in the SMPDB BioPAX; so, we could safely merge these alike. E.g., all those 50 pathways contain same UnificationXrefs with id: SMP00016 and PW000149 (pathwhiz - no idea what that means; not in MIRIAM).
(The last question is why those 49 pathways do contain that weird interaction?..)
Also, there are many (sub-)pathways that have only name (no components, no xrefs at all), such as "G-protein signalling cascade" in SMP00327.owl. E.g., this search query returns three hits - all are empty no-xrefs pathways... These were not merged automatically and hang around.
Done.
There are still too many simple sub-pathways, often using the same name, only-two pathway components, which is always a http://identifiers.org/smpdb/
while others use "http://smpdb.ca/pathways/#" base and have no xrefs (we unable to normalize and merge these pathways).
@cannin , et. al., please see/try:
http://beta.pathwaycommons.org/pc2/search?q=name:%22Propanoate%20metabolism%22&type=pathway&datasource=smpdb
50 pathways have the same name. This is just one example. Shall we still import these data into PC2?