Closed ozgunbabur closed 8 years ago
Good catch, not a trivial thing.., and it's unclear how to perfectly fix it...
The Protein (originally from ReconX) has no xrefs, but the corresponding ProteinReference (PR), http://webservice.baderlab.org:48080/get?uri=http://identifiers.org/uniprot/Q9BZ23, has got multiple xrefs (those come from each original PR mapped to this primary canonical UniProt PR).
And indeed, the PR (Q9BZ23) also has HGNC:8598 and HGNC:19365 Xrefs (originate from the translated and normalized KEGG data: RelationshipXref_kegg_hsa_hsa00770_translatedHGNC_19365_2, RelationshipXref_kegg_hsa_hsa00770_translatedHGNC_8598_2), which map to different primary UniProt accession numbers (AC); see http://webservice.baderlab.org:48080/idmapping?id=HGNC:8598&id=HGNC:19365&id=HGNC:15894.
When Merger "decides" to replace an original PR with the one from the warehouse, it also then copies all the xrefs from the original PR to the canonical one. So, despite we take care not to replace a PR unless it uniquely maps to only one UniProt AC, in the above case, we created confusion or mess... If we'd not have copied the xrefs, we'd have lost all original ones always...
More details. What happens is that e.g., KEGG hsa_00770 has a PR like (and the corresponding Protein also has these same xrefs, by the way), which is hard to tell what it means...:
Now fixing (in the biopax-validator Normalizer, KeggCleanerImpl, and cPath2 Merger)...
Oops, looks, I found the bug in the PC2 Merger (in idMappingByXrefsIntersection method), - the true reason why e.g. "hsa53354_hsa55229_hsa79646_hsa80025.eref" PR merges into canonical "Q9B223" despite id-mapping by xrefs is ambiguous (it must not merge!) Fixing now...
Found a bug in the xrefs of some ProteinReference of v8. Look at the PR with ID
http://purl.org/pc2/8/Protein_84cb92e17a9f567f5456a46088aa57e1
It is related to the gene PANK2, and it has the xref [db:HGNC Symbol, id: PANK2], just as expected.
But it also have the xref [db: HGNC, id: HGNC:8598], and the xref [db: HGNC, id: HGNC:19365]. The first one belongs to PANK1 and the second one belongs to PANK3.
That is not new to v8. It existed in v7 too.