Closed GoogleCodeExporter closed 5 years ago
FYI (recently came across), The NDEx project uses some if not all PC2 v6 data, in SIF format (e.g., the normalized/merged NCI PID, IntAct, Panthr, etc.). See, e.g., http://www.ndexbio.org/#/network/6f259e96-c4ad-11e4-bcc4-000c29cb28fb or search for something (like "brca2") in the top bar. And you can then pick a source network and run queries on it, visualize (I suppose, neighborhood queries; e.g., try "brca2" and depth=1), perhaps, using Cytoscape.js.
Currently, a PC2 graph query accepts different ID types and URIs and uses full-text search and id-mapping to find phys. entities and genes to use as seeds in the BioPAX graph query; the result (sub-network) is then converted to SIF format if required.
I do not see an easy and general solution for SIF based graph queries (in Paxtools) that would allow us to support several input ID types and URIs at the same time in PC2...
@ozgunbabur please have a look. See also BioPAX/Paxtools#21
If PC stores the big SIF where identifiers are URIs of EntityReferences, then the same id-mapping can be used for SIF queries as well. But this means 3 steps instead of 1 step:
I just noticed that my above comment is in great error. Yes, full text search and ID mapping does not go well with SIF graphs. In fact, the SIF structure can be different according to the different ID types that are used. I mean SIF graph with gene symbols can have a different structure than the SIF graph with UniProt IDs. Same thing for the SIF graph with URIs. Those different structures will potentially output different graph query results.
For instance let's say there is a path from A to B in the SIF with gene symbols of length 2 (A -> X -> B). This path may be missing in the SIF graph with URIs because there can be two different URIs corresponding to X (lets say URI-X1 and URI-X2). So the paths URI-A -> URI-X1 and URI-X2 -> URI-B will be two disconnected paths in this graph and the query won't find a path from URI-A to URI-B. It is better to not use URIs in SIF graphs.
On the other hand, I don't consider that a big problem. It's OK if full text search and ID mapping does not work for SIF queries. We can say that we support either gene symbols or UniProt IDs for SIF queries. And we can also tell users that they can end up with different query results when they use different ID options because gene symbol to UniProt mapping is not one-to-one. That is just a fact of life.
Also, how about if we query our (PC8) SIF that uses UniProt/ChEBI IDs rather than the HGNC or URI based SIF data archives? This is because mapping of query input IDs is much easier and faster to UniProt (by cpath2/PC2 design). Then, if a user requested HGNC Symbols SIF as the output, we'd map (expand) each SIF entry in there (interaction) to multiple lines that have the corresponding HGNC symbols (I'd even simply printed them separated with ';' instead of making many new lines, one per gene name...).
On Tue, Nov 1, 2016 at 2:14 PM, Özgün Babur notifications@github.com wrote:
I just noticed that my above comment is in great error. Yes, full text search and ID mapping does not go well with SIF graphs. In fact, the SIF structure can be different according to the different ID types that are used. I mean SIF graph with gene symbols can have a different structure than the SIF graph with UniProt IDs. Same thing for the SIF graph with URIs. Those different structures will potentially output different graph query results.
For instance let's say there is a path from A to B in the SIF with gene symbols of length 2 (A -> X -> B). This path may be missing in the SIF graph with URIs because there can be two different URIs corresponding to X (lets say URI-X1 and URI-X2). So the paths URI-A -> URI-X1 and URI-X2 -> URI-B will be two disconnected paths in this graph and the query won't find a path from URI-A to URI-B. It is better to not use URIs in SIF graphs.
On the other hand, I don't consider that a big problem. It's OK if full text search and ID mapping does not work for SIF queries. We can say that we support either gene symbols or UniProt IDs for SIF queries. And we can also tell users that they can end up with different query results when they use different ID options because gene symbol to UniProt mapping is not one-to-one. That is just a fact of life.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/PathwayCommons/cpath2/issues/202#issuecomment-257647223, or mute the thread https://github.com/notifications/unsubscribe-auth/AA8fwUGxji1dWJ3f9N_KCd-znmWOIbiuks5q54GYgaJpZM4EUNPa .
Original issue reported on code.google.com by
ozgunba...@gmail.com
on 4 Mar 2015 at 9:10