PathwayCommons / cpath2

Biological pathway data integration and access platform (Pathway Commons)
http://www.pathwaycommons.org/pc2/
MIT License
6 stars 5 forks source link

Simplify, generalize, improve the id-mapping #227

Closed IgorRodchenkov closed 8 years ago

IgorRodchenkov commented 8 years ago

Currently we build the id-mapping tables from a) UniProt and ChEBI data (from the ERs' xrefs, after converting the original text data to BioPAX warehouse data models) and b) from manually created special data source unichem_mapping.zip (metabilites id-mapping from ChEMBL, PubChem CID, etc. to ChEBI IDs). Then the resulting internal id-mapping db is used both during the data merge and graph queries.

Idea.

We could probably use BridgeDb (2.2.1) console app/jar to generate all the mapping tables we want from lists of all the UniProt and ChEBI IDs, and simply add it as e.g., idmapping.zip - warehouse type data source - to PC2 instance metadata/data configuration. So, then, there is no need in having too many different kind of xrefs in the warehouse data and main model. Also, we'd directly use the id-mapping db when creating the full-text index as well (instead of using xrefs for that)....

IgorRodchenkov commented 8 years ago

Duplicate of #226