PathwayCommons / cpath2

Biological pathway data integration and access platform (Pathway Commons)
http://www.pathwaycommons.org/pc2/
MIT License
6 stars 5 forks source link

PID: NCI curated pathway data, BioPAX L3 issues #200

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
Data source: PID (NCI Pathway Interaction Database: Pathway), BioPAX L3 format
Date downloaded: 07 March 2014 (it's exactly the same data I last downloaded on 
09 February 2015)
Version, if available:
File name downloaded: NCI-Nature_Curated.bp3.owl.gz
File location: 
ftp://ftp1.nci.nih.gov/pub/PID/BioPAX_Level_3/NCI-Nature_Curated.bp3.owl.gz

Data source issues:

Using the BioPAX Validator (non-strict profile/mode, inside cPath2 pre-merge 
pipeline), there were
18 different types of error/warning (~27K out of 44K cases could not be fixed 
automatically).

For details, please see the following Google document: 
https://docs.google.com/document/d/18_I0NMR1wXz73fKl5e-IYQj6w7TOxAxrr-LQeqnDJoU/
edit?usp=sharing.

The BioPAX validation report (in HTML and XML formats) is attached here.

Is the data source aware of the issue?
Not yet, or partially, and we're going to share with them shortly.

It seems that NCI/PID once again is about to discontinue this data (and 
probably freeze the web site forever), the BioPAX files in particular. So 
either Pathway Commons or NDEx team are to take over supporting some of these 
data in the future. This is probably the last change to fix/improve the BioPAX 
L3 data together, with PID team's help.

Original issue reported on code.google.com by rod...@gmail.com on 12 Feb 2015 at 7:29

Attachments:

IgorRodchenkov commented 9 years ago

I finished summarizing the errors/warnings in the Google document.

IgorRodchenkov commented 9 years ago

Today, we're discussing this with NCI PID (Mervi)

IgorRodchenkov commented 9 years ago

Cu has made good progress in fixing PID BioPax issues. Fixing some of remaining issues would require manual curation, which is no longer an option. Some cases cannot be fixed due to the followings

  1. Gene removed from databases without new version
  2. Not in uniprot / pubchem
  3. Not specific term description
  4. Not in DB

Now we're waiting for that fixed BioPAX data, to try it.

IgorRodchenkov commented 9 years ago

Got the final data from Cu (and he will soon make it publicly available to download from Github) and validated. Looks better, though e.g. 'shared.unification.xref' and 'cloned.utility.class' (mostly Stoichiometry) warnings somewhat still worry me... We will import this NCI version into PC2 v8! Good job.