PathwayCommons / cpath2

Biological pathway data integration and access platform (Pathway Commons)
http://www.pathwaycommons.org/pc2/
MIT License
6 stars 5 forks source link

Non-equivalent BioPAX objects in different files have the same absolute URI #192

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
Data source: PANTHER Pathway
Date downloaded: before 12 November 2014
Version, if available: 3.3
File name downloaded: it's an inter-file issue (affects multiple files, 
multiple objects), e.g., B_cell_activation.owl and T_cell_activation.owl
File location: ftp://ftp.pantherdb.org/pathway/3.3/BioPAX.tar.gz (or 
../current_release/)

Data source issue:

For example, there are Catalysis objects with 
rdf:ID="_CATALYSIS___CD45_r4m1_r4" in both B_cell_activation.owl and 
T_cell_activation.owl. But they control different BiochemicalReaction objects 
(different URIs), etc. The (big) problem is when one merges these files 
together with paxtools or to a RDF triple store... Perhaps, there are more such 
examples (note: all those PANTHER files use the same 
xml:base="http://www.pantherdb.org/pathways/biopax#"; therefore full URIs are 
the same IIF rdf:ID are).

I would suggest to check before each release across all the files that only 
equivalent/same things have same URIs. The other way is to use unique xml:base 
for each file, perhaps each release (like Reactome does), e.g.:  
"http://www.pantherdb.org/pathways/biopax/1/", 
"http://www.pantherdb.org/pathways/biopax/2/", etc. (could use your internal 
pathway IDs, if there are any, instead of 1,2,..); though, in this case, there 
can be many equivalent/duplicate utility class objects (lesser evil than the 
problem we're discussing). This won't affect standard absolute URIs, i.e., 
rdf:about="http://identifiers.org/*".

Is the data source aware of the issue?
Yes (recently reported)

PS:
This is in part a cPath2 s/w, PC2 v5 release,  issue too (the Merger could do 
better to work around such URI clashes; anyway, fixing in cPath2 won't help 
other independent PANTHER BioPAX data users.)

Original issue reported on code.google.com by rod...@gmail.com on 17 Nov 2014 at 7:30

GoogleCodeExporter commented 9 years ago
Data Fix: updated BioPAX file became available at 
ftp://ftp.pantherdb.org/pathway/current_release/ (also, the same fixed archive 
is available at ftp://ftp.pantherdb.org/pathway/3.3/) on 12 November 2014.
Each pathway file now has unique (URI) xml:base, e.g., 
"http://www.pantherdb.org/pathways/biopax/P00010#" in the 
B_cell_activation.owl, etc.

Thanks to Anushya, Huaiyu!

The cPath2 Merger code was also updated (a workaround added); anyway, the new 
PANTHER PAthway BioPAX data were used for the PC2 v6 BioPAX model and web 
service (PC2 v5 won't be re-released though).

Original comment by rod...@gmail.com on 17 Nov 2014 at 7:42