Closed IgorRodchenkov closed 7 years ago
Seems, we'd make use (generate xrefs, include in id-mapping) of other types of ID in the UniProt/SwissProt file 'DR' records: PDB (e.g., many BIND interaction participants refer for PDB IDs only), IPI, UniGene, EMBL (GI), GEneCards, etc...
For example, a ProteinReference from Humancyc: http://webservice.baderlab.org:48080/get?uri=http://purl.org/pc2/8/ProteinReference_be51b7c85831444c5010ad571d981e8c
Obsolete P30712 maps to (is secondary ID of) P0CG29, P0CG30 has no HGNC xrefs, no primary uniprot xrefs.
belongs to protein: http://webservice.baderlab.org:48080/get?uri=http://purl.org/pc2/8/Protein_4c1ac8c2cd2ce46713bc25194507aeb9
PC8 did not merge it with a warehouse PR (it’s OK!), but it also failed to map it to P0CG29, P0CG30 and did not add any canonical xrefs, why? Because http://webservice.baderlab.org:48080/idmapping?id=P30712 (internal id-mapping) returns empty result, as we have excluded all ambiguous mappings when created the PC2 warehouse and id-mapping repository.
We should NOT probably have excluded ambiguous mapping results from the db (in Premerger.buildIdMappingFromWarehouse(model)) if we wanted extra canonical UniProt, ChEBI and HGNC Symbol xrefs to be added for some of not-merged ERs. It would be safe, as the Merger does not replace an original ER with a warehouse's one if id-mapping result is ambiguous; it could just in some cases auto-generate primary ID xrefs that we really want for search and graph queries to have greater data coverage.