Closed IgorRodchenkov closed 5 years ago
This, I bet, also makes #243 re-occur... need to check and test again (after re-building beta PC9 instance from scratch...)
Also, strings: "SubPathway", "SubPathwayOutput", "SubPathwayInput" should not be values of BioPAX property 'name' of a Pathway, SmallMolecule, etc. This is another biopax hack - probably, not so useful or even misleading for data analysts... Better put these values to 'comment' BioPAX property.
SMPDB have already addressed the URI and xref.id issues above. Great.
More SMPDB observations and ideas;
pathwayOrder
property, remove PathwaySteps
(having no nextStep
, usually one dummy interaction as stepProcess
); it's ok. I could suggest a better way to model pathway steps (using stepConversion
, stepProcess
and nextStep
biopax properties; similar to what Reactome does); let's chat with SMPDB team.(the following comment is from pathway-commons-dev emails, 29 Aug - 7 Sep, 2018)
... here is the list of problem pathways and their number of instances:
Cardiolipin Biosynthesis (S. cerevisiae) 8958
Cardiolipin Biosynthesis Pathways (H. sapiens) 3277
Cardiolipin Biosynthesis (Barth Syndrome) (H. sapiens) 20016
De Novo Triacylglycerol Biosynthesis Pathways (H. sapiens) 22656
Phosphatidylcholine Biosynthesis Pathways (H. sapiens) 922
Phosphatidylcholine Biosynthesis (S. cerevisiae) 162
Phosphatidylethanolamine Biosynthesis (H. sapiens) 922
Phospholipid Biosynthesis (E. coli) 910
Triacylglycerol Degradation (A. thaliana) 1728
Triacylglycerol Metabolism (S. cerevisiae) 322
So.. let's just remove SMPDB altogether from PC11...
That would be my vote.
New version of SMPDB BioPAX data (we downloaded release 05-Jun-2016 BioPAX archive and imported into beta PC9) contains UnificationXrefs (of a Pathway) like:
Using an URI in an Xref.id property is a mistake. Those URIs like "http://identifiers.org/smpdb/SMP00001" should be URI of corresponding Pathway BioPAX objects instead (recommended; it also helps to avoid duplicate pathways when integrating multiple SMPDB BioPAX files into one model).
For example, instead of:
much better would be to use (for pathway definitions and references - where you have official standard URIs and IDs):
This would make SMPDB BioPAX more useful for everyone (currently, we in Pathway Commons, have to do these fixes, replace URIs and IDs to integrate and use SMPDB data...)
(Let's contact SMPDB authors and also update/fix our data cleaner code.)
@cannin @gbader @emekdemir @ozgunbabur @jvwong