dondi / GRNsight

Web app and service for modeling and visualizing gene regulatory networks.
http://dondi.github.io/GRNsight
BSD 3-Clause "New" or "Revised" License
17 stars 8 forks source link

Cytoscape-exported GraphML imported into GRNsight #312

Closed kdahlquist closed 8 years ago

kdahlquist commented 8 years ago

Manually created Cytoscape network exported to GraphML and imported into GRNsight

more later...

kdahlquist commented 8 years ago

So, I think this is going to be a little tricky. I think we are going to end up making accommodations specific for Cytoscape (or yED) and there will be no guarantee that the format we end up with will be able to be read by other programs. I don't know what else we can do, though.

So what I have done is make a simple 4 node network in Cytoscape manually using their graphical interface (unweighted). Gene1 > Gene2 Gene2 > Gene3 Gene3 > Gene4 Gene4 > Gene4

Exporting this as a SIF file and importing it into GRNsight works great. (Side note, their default relationship type is "interacts with", which defies using spaces as a delimiter. It can be changed within the UI, but I didn't bother.)

Exporting this network as GraphML then reveals the differences in encoding that Cytoscape uses versus what we have implemented and can read with GRNsight. I'm attaching the SIF, the Cytoscape-exported GraphML and the GRNsight exported GraphML for comparison (the latter derived from reading importing the SIF to GRNsight and exporting as GraphML).

Cytoscape-to-GraphML_tests.zip

Upon visual inspection (used firstObject for this), what leaps out at me is that the Cytoscape graphml file has a bunch of key tags at the top defining terms it uses later for the node and edge tags. GRNsight doesn't have this. We might need to have it if we start using some of the tags (I'm not sure). I'm going to explain what I think these various tags mean.

So, to sum up, for us to read an unweighted network from Cytoscape-exported GraphML, we need to read the "name" field instead of the "node id" field for the nodes because the session ID isn't going to help the user at all.

Now this obviously has implications for being compatible with GraphML coming from other sources. Maybe we need some type of decision tree saying that if a "name" is found, use that, if not, use "id"? Further thoughts?

For what it's worth, yED does not deal with this issue very well, either. When I take a Cytoscape-exported graphml and import it into yED, no label shows up at all. However, all the keys show up in a properties window, so either it knows how to read Cytoscape or it just slurps it up for display from any graphml (probably the latter).

kdahlquist commented 8 years ago

OK. Here is what is happening with weighted networks exported from Cytoscape as graphml and imported into GRNsight.

I created a weighted network manually in Cytoscape by adding the column "weight" to the edge table. I'm pretty sure this is how to do it, but it's actually hard to find in the manual. I also haven't yet figured out how to color the edges based on the weight, so I can't be certain that I've done it right, but anyway, I think I have.

When I export this to graphml, something expected and unexpected is happening.

At least, that is what I infer from looking at our own GraphML output.

The upshot of all of this is that GRNsight is not reading the weights and is just displaying the network as an unweighted network.

Update: I tested my own hypothesis by editing the XML directly, changing the for assignment from "node" to "edge" and then tried importing into GRNsight. After I did that, the weighted edges displayed properly.

So, I don't know if the issue is a genuine bug in how Cytoscape exports GraphML or whether I made a mistake in the way that I assigned weights in Cytoscape. I'm inclined to think it is a bug in Cytoscape because when it defines other keys, like "interaction" which should apply to the edges, it assigns them to nodes instead. I also did a GRNsight > GraphML > Cytoscape > GraphML conversion and it made the same mistake.

I don't know what to do about this one. We could, of course, modify our parser to allow the mistake to be read correctly, but the proper course of action is probably to report the bug to Cytoscape. Thoughts?

I probably need to track down some other program that can export GraphML to see what it looks like.

Minor side note: we might consider calling "edge-value" "weight" instead as Cytoscape does, unless we shouldn't actually call the key id the same thing as the attr.name.

kdahlquist commented 8 years ago

Forgot to post the test files. GraphML_weight_tests.zip

dondi commented 8 years ago

I'm leaning on applying a priority system like you mentioned, where we first look for the presence of label keys, then drop down to the id. For export, we can export both the labels and the ids but preserve the same names in both, so they will look the same regardless of whether the importing software reads node id's or node data keys.

I also think that the Cytoscape weight export issue is a bug—the official GraphML tutorial that I used as a basis for the initial implementation definitely defines the attr.name="weight" key with for="edge". It also is conceptually ambiguous to associate a weight with just a node anyway. I'll see if we can [temporarily] ignore the for attribute of the key.

As for the key’s standard ID, I went with edge-value because the GRNsight label for this property is value. But admittedly it didn’t even occur to me to use the name of the attribute also (i.e., weight). The GraphML tutorial used arbitrary names like d0 and d1 so that somewhat threw me off. No problem with changing this to weight.

kdahlquist commented 8 years ago

Yeah, I'm about to work on the flip side, GRNsight-exported GraphML imported into Cytoscape, #311.

There is a bug report form for Cytoscape here: http://chianti.ucsd.edu/cyto_web/bugreport/bugreport.php

I can report the bug, but they ask that it be reproduced on another computer. It might be helpful for you to see if it reproduces on a Mac; I'm using v3.4.0 of Cytoscape.

dondi commented 8 years ago

OK I can do that. So the steps are:

kdahlquist commented 8 years ago

Yes. The way you do the weights is click on the "edge table" tab at the bottom of the Cytoscape window. click on the + at the top of that panel to add a column. I used floating point as the data type and called the column "weight". Then when the column appears, you can type in the weight values.

dondi commented 8 years ago

The bug has been reported. We’ll see what happens.

Meanwhile, I verified that our prior code was indeed requiring for="edge" in order to detect the weight attribute. I’ve removed that condition for now. We can consider this an accommodation for Cytoscape users because it isn’t their fault that Cytoscape has this bug.

I’ve also updated the node labeling logic to use cascading conditions for choosing the node label. The deployed beta now prioritizes a name key, then shared name, using the id only in their absence.

And, as a smaller note, I’ve switched our exported key ID to weight.

kdahlquist commented 8 years ago

Thanks for submitting the bug. I haven't had a chance to test this yet. I will get to it tomorrow.

kdahlquist commented 8 years ago

Verified that these changes result in successful import of both unweighted and weighted networks from Cytoscape graphml. Closing now.