PathwayCommons / cpath2

Biological pathway data integration and access platform (Pathway Commons)
http://www.pathwaycommons.org/pc2/
MIT License
6 stars 5 forks source link

a INOH Cleaner #218

Closed IgorRodchenkov closed 9 years ago

IgorRodchenkov commented 9 years ago

http://inoh.hgc.jp (www.inoh.org is not properly resolved anymore - domain name has been probably lost...)

INOH BioPAX L3 looks good in general (there are warnings reported by the biopax-validator); let's fix the following in the cleaner:

IgorRodchenkov commented 9 years ago

Also discovered that INOH BioPAX has the following issues:

IgorRodchenkov commented 9 years ago

Another type of problem - there are many TemplateReactions that have a Complex as the value of the 'template' property, which is apparently wrong. There can be only values/objects of Dna, Rna, RnaRegion or DnaRegion type; other types get ignored by the Paxtools' parser (and another BioPAX/OWL parser might simply fail short).

(Note: INOH BioPAX RDF/XML files were created somehow ignoring such critical errors; data files that contain illegal use of OWL properties are simply impossible to write with Paxtools Java library.) Example INOH files that have this sort of errors are, e.g., BMP2_signaling_TGF-beta_MV.owl ("id1354461388_Transcription"), FGF8_Mouse.owl ("id814152498_Transcription" and several more.

This cannot be fixed with Paxtools; perhaps the Validator could do (with some new code added there); could be fixed with a custom script, based on regex or RDF tools... Ideally, this must be addressed by the data Ca provider, INOH.

Addition. After some analysis (with Gary B.), we've decided to remove all such illegal TemplateReaction objects that have a "gene" Complex (of two DNAs - coding and responsive el.) as 'template' from the model. These are hardly useful for public, for there are no standard gene/sequence identifiers (there are xrefs to INOH and IGS ontology that cannot be found any more, and DnaReferences are trivial, with name 'Dna' and no xrefs...)

IgorRodchenkov commented 9 years ago

Also, e.g., in BMP2_signaling_TGF-beta_MV.owl, Pathway has pathwayOrder (steps) but no pathwayComponent values. Shall we copy reactions from the listed steps to pathwayComponent?

IgorRodchenkov commented 9 years ago

Ok, we've done what we could. Closing the issue for now (let's re-open if we gonna need to do more, later).

IgorRodchenkov commented 9 years ago

We downloaded and saved most (if not all) of the INOH v4 data for future use/fix here.