PathwayCommons / cpath2

Biological pathway data integration and access platform (Pathway Commons)
http://www.pathwaycommons.org/pc2/
MIT License
6 stars 5 forks source link

Could improve use of ambiguous id-mapping and get more UniProt, HGNC, ChEBI Xrefs on not-merged ERs #226

Closed IgorRodchenkov closed 7 years ago

IgorRodchenkov commented 8 years ago

For example, a ProteinReference from Humancyc: http://webservice.baderlab.org:48080/get?uri=http://purl.org/pc2/8/ProteinReference_be51b7c85831444c5010ad571d981e8c

Obsolete P30712 maps to (is secondary ID of) P0CG29, P0CG30 has no HGNC xrefs, no primary uniprot xrefs.

belongs to protein: http://webservice.baderlab.org:48080/get?uri=http://purl.org/pc2/8/Protein_4c1ac8c2cd2ce46713bc25194507aeb9

PC8 did not merge it with a warehouse PR (it’s OK!), but it also failed to map it to P0CG29, P0CG30 and did not add any canonical xrefs, why? Because http://webservice.baderlab.org:48080/idmapping?id=P30712 (internal id-mapping) returns empty result, as we have excluded all ambiguous mappings when created the PC2 warehouse and id-mapping repository.

We should NOT probably have excluded ambiguous mapping results from the db (in Premerger.buildIdMappingFromWarehouse(model)) if we wanted extra canonical UniProt, ChEBI and HGNC Symbol xrefs to be added for some of not-merged ERs. It would be safe, as the Merger does not replace an original ER with a warehouse's one if id-mapping result is ambiguous; it could just in some cases auto-generate primary ID xrefs that we really want for search and graph queries to have greater data coverage.

IgorRodchenkov commented 8 years ago

Seems, we'd make use (generate xrefs, include in id-mapping) of other types of ID in the UniProt/SwissProt file 'DR' records: PDB (e.g., many BIND interaction participants refer for PDB IDs only), IPI, UniGene, EMBL (GI), GEneCards, etc...