PathwayCommons / cpath2

Biological pathway data integration and access platform (Pathway Commons)
http://www.pathwaycommons.org/pc2/
MIT License
6 stars 5 forks source link

Remove unwanted EntityReferences (e.g., INOH) #228

Closed IgorRodchenkov closed 8 years ago

IgorRodchenkov commented 8 years ago

(Ozgun:) Querying paths from HRAS to BRAF using ChiBE s/w brings me a hairball component. This comes from INOH. There, all Ras and Raf proteins are defined with a generic entity reference that have lots of member entity references with no name, but with a single uniprot ID as xref. When I check, I see that these Uniprot IDs belong to Raf and Ras of every kind of organisms.

(Igor:) Such no-name, single-ID generic ERs and their member ERs were auto-generated (by InohCleanerImpl, which cannot filter UniProt IDs by organism at that point) from original pseudo-generic entities and entity references that had multiple UniProt ID xrefs (different organisms; despite the data are supposed to be human); that was actually a fix, to match/merge one or more of those member ER with canonical ER from the warehouse. No surprise, most of these did not merge with anything human and just hang around, being not useful...

So, let's lean up - delete such original unmatched sequence EntityReferences, before finally merging into the main model, that do not belong to any physical entity (entityReferenceOf is empty) and are memberEntityReferenceOf a generic ER (the latter check may be unnecessary, because dangling ERs are auto-removed anyway). We could hack this only for INOH data, but - better generalize for all data providers (easy).