PathwayCommons / cpath2

Biological pathway data integration and access platform (Pathway Commons)
http://www.pathwaycommons.org/pc2/
MIT License
6 stars 5 forks source link

Missing binary (SIF) interactions and gene names. #292

Closed IgorRodchenkov closed 6 years ago

IgorRodchenkov commented 6 years ago

There was a feedback/question:

I've been poking around with Pathway Commons SIF and extended SIF files and had a question about the BioGrid data. There are a lot of interactions that I can see on the BioGrid website that I can't find in the biogrid SIF (or the "All" SIF). For example, KRAS doesn't appear anywhere in the biogrid SIF, and interactions between KRAS and BRAF and ARAF aren't present in the "All" SIF (even though they are on the BioGrid website at https://thebiogrid.org/110043/summary/homo-sapiens/kras.html).

Is the BioGrid data filtered in some way during the import into PC? If so, what is the filter?

Thanks for any help! John

I see, in some entity references, such as KRAS ProteinReference (converted from BioGRID PSI-MI), uri=http://identifiers.org/refseq/XP_016874782, there are no HGNC Symbol type xrefs but there are UniProt, RefSeq and HGNC ID xrefs and gene names (defined via PSI-MI properties names/alias, names/shortLabel) .

By the way, one can find that entity reference in PC2 using very specific query string xrefid:"KRAS", which means id-mapping and indexing was done properly, but additional xrefs were not saved in the model during PC data merge stage.

Looks, the problem is that a bug in our Merger prevented auto-generating of "HGNC Symbol" xrefs (we'd use in SIF and GSEA files) due to presence of the "HGNC" (ID) xref.

IgorRodchenkov commented 6 years ago

Fixed in PC10 (beta)