eppic-team / eppic

:white_check_mark::x:Evolutionary protein-protein interface classifier
http://eppic-web.org
Other
8 stars 3 forks source link

Drawing the Assembly Diagram #104

Closed lafita closed 7 years ago

lafita commented 8 years ago

We need the following representations of the diagram (graph):

The layout of the nodes in the diagram is calculated based on the StereographicLayout method, so we only need a program or library that draws the nodes in the calculated coordinates (custom layout) and with the defined style (size of the nodes and labels).

The main issue is that the jgraph library does not support very well the custom layout (it changes positions when resizing the image), the size of the nodes cannot be customized and only outputs PNG format (a vector graphics format would be lighter).

lafita commented 8 years ago

I have been investigating a bit with the igraph library for R and it has exactly what we search for. They also have python and C implementations (no java).

The plot function takes a graph object (that specifies the edges) and a layout object (that specifies the positions of each vertex), so they are factored out. The layout is already calculated by Spencer in the StereographicLayout class of eppic, so it is only a matter of outputing it in a parsable text format.

It is very simple and works as expected (no resizing modifications). In addition it has all the other features of a graph visualization (resize the vertices, color and label edges, etc).

The only drawback may be the installation of igraph (we do not even need root), using an additional language (R) and that the Java code should output two text files, one to define edges and one to specify the layout.

I have a little R script that generates diagrams from these two file formats. Here are some examples generated with it for the diagram of assembly 11 of 2hda (D3):

Thumbnail 2hda_11_small

Detailed 2hda_11_big

lafita commented 8 years ago

I quote Jose's reply to the e-mail:

To me the output looks fantastic. The one issue I see is that we need to create more files (csv data files and svg files, especially bad for precomputation) and also interface to R somehow. Not enormous problems but better if avoided.

To me the ideal solution would be a javascript library that would push all the visualisation to the client, without us needing to do much more server-side. You guys did mention something about some JS library, didn't you?

visjs looks very good: http://visjs.org/network_examples.html Also Alex tells me that D3.js is what a lot of people use: https://d3js.org/ No idea if we can provide our own layouts to them.

lafita commented 8 years ago

Guido also mentioned the interesting thread in ResearchGate about this topic: https://www.researchgate.net/post/What_is_the_best_JAVA-based_graph_drawing_toolkit_for_graphs_with_weighted_edges_and_a_varying_number_of_nodes_during_runtime

lafita commented 8 years ago

I have been investigating the different alternatives (updated 11.04): Here is a summary:

Library Language Custom Layout? Output Formats Calculation Advantages Disadvantages
JUNG Java Yes PNG, JPG, or EPS Server Java. Poor documentation and basic style.
jGraphX Java Difficult PNG Server Java. Basic style. Bad experience so far with layout.
sigma.js JavaScript Yes SVG, JS or WebGL User Intuitive, specific and documented. Fancy style and interactive graphs possible. Less customizable than other JS libraries.
D3.js JavaScript Not Found SVG, WebGL User Used widespread. Fancy style and interactive graphs possible. Very general, not only graphs, and data driven (using JSON files).
vis.js JavaScript Yes SVG, WebGL User Simple to use and powerful.
Graphviz DOT Yes PNG, SVG, PDF Server Intuitive, standard and customizable format, specially designed for graph description. Sophisticated style. Single file required to store graph, layout and style. Requries external software.
igraph R, Python Yes SVG, PNG, PDF Server Intuitive and simple. Sophisticated style possible. Requires different files for layout, graph and style. Requries external software.
viz.js DOT, JavaScript Yes in theory SVG, PNG User A hack to put Graphviz on the web. Very simple once the DOT file is generated. Not very well documented or maintained, but very simple (maybe no need to). Seems to ignore the pos lines (layout).
lafita commented 8 years ago

I quote Spencer's reply to the e-mail:

igraph does look nice. I was experimenting with graphviz, which is quite powerful and has lots of options e.g. for curved edges, svg/png/pdf, and for automatic layout of some nodes. You can even do nested networks (which made me think about the heteromer vertex merging approach). Of course, like igraph it requires installing additional software.

d4

The graphviz file format allows to specify the style of the nodes, an advantage over igraph (where those are parameters of the plot function). The file to generate the figure above is:

digraph D4 {
    splines="spline"
    scale="0.5"
    node [
        style="filled"
        size="10,10"
        color="#1f78b4"
        fillcolor="#a6cee3"
        width="1.0"
        height="1.0"
    ]
    A0 [ pos="100,100"]
    A1 [ pos="100,1000" ]
    A2 [ pos="1000,1000" ]
    A3 [ pos="1000,100" ]
    A4 [ pos="300,300" ]
    A5 [ pos="300,700" ]
    A6 [ pos="700,700" ]
    A7 [ pos="700,300" ]

    A0 -> A1 [label="1"]
    A1 -> A2 [label="1"]
    A2 -> A3 [label="1"]
    A3 -> A0 [label="1"]
    A4 -> A5 [label="1"]
    A5 -> A6 [label="1"]
    A6 -> A7 [label="1"]
    A7 -> A4 [label="1"]
    A0 -> A4 [label="2"]
    A1 -> A5 [label="2"]
    A2 -> A6 [label="2"]
    A3 -> A7 [label="2"]
    A0 -> A5 [label="3"]
    A1 -> A6 [label="3"]
    A2 -> A7 [label="3"]
    A3 -> A4 [label="3"]
    A4 -> A3 [label="~3"]
    A6 -> A3
}
lafita commented 8 years ago

There are some threads about JavaScript rendering of the DOT format, so it is possible that we do not need external software and that the calculations (rendering of the graph) happen on the user-side. We would need to convert the internal graph representation and layout to a DOT String and render it with JavaScript. See the attached links:

http://stackoverflow.com/questions/6344318/pure-javascript-graphviz-equivalent

http://stackoverflow.com/questions/22595493/reading-dot-files-in-javascript-d3

http://stackoverflow.com/questions/4366511/is-there-a-jquery-plugin-for-dot-language-file-visualization

The consensus solution seems to be viz.js

sbliven commented 8 years ago

So it looks like a dot mustache template for latticegraphs would be ideal. I can refactor the SterioGraphicLayout to position nodes directly rather than going through the jgraphx intermediate.

This vis.js example might be useful. It shows how to make positions correspond directly to pixel coordinates for full control of the output.

lafita commented 8 years ago

I have playing around with sigma.js and I have had a good experience. Like vis.js, it is very simple to input the positions of each node. Look at this basic example they provide:

/**
 * This is a basic example on how to instantiate sigma. A random graph is
 * generated and stored in the "graph" variable, and then sigma is instantiated
 * directly with the graph.
 *
 * The simple instance of sigma is enough to make it render the graph on the on
 * the screen, since the graph is given directly to the constructor.
 */
var i,
    s,
    N = 100,
    E = 500,
    g = {
      nodes: [],
      edges: []
    };

// Generate a random graph:
for (i = 0; i < N; i++)
  g.nodes.push({
    id: 'n' + i,
    label: 'Node ' + i,
    x: Math.random(),
    y: Math.random(),
    size: Math.random(),
    color: '#666'
  });

for (i = 0; i < E; i++)
  g.edges.push({
    id: 'e' + i,
    source: 'n' + (Math.random() * N | 0),
    target: 'n' + (Math.random() * N | 0),
    size: Math.random(),
    color: '#ccc'
  });

// Instantiate sigma:
s = new sigma({
  graph: g,
  container: 'graph-container'
});
</script>

The variables x and y control the position of each node and are specified when the node is created.

lafita commented 8 years ago

I could reproduce the graph for one of the D3 assemblies of 2hda. The amount of code needed (that has to be generated automatically with eppic) is the following:

// Generate the graph based on the information (generated coded)
g.nodes.push({id: '1', label: 'A1', x: 4, y: 4, size: 50});
g.nodes.push({id: '2', label: 'A2', x: 0, y: 0, size: 50});
g.nodes.push({id: '0', label: 'A0', x: 8, y: 0, size: 50});
g.nodes.push({id: '10', label: 'A10', x: 5, y: 1, size: 50});
g.nodes.push({id: '11', label: 'A11', x: 3, y: 1, size: 50});
g.nodes.push({id: '9', label: 'A9', x: 4, y: 2, size: 50});
g.edges.push({id: '101', source: '1', target: '0', label: '-3(3)-'});
g.edges.push({id: '102', source: '2', target: '1', label: '-3(3)-'});
g.edges.push({id: '103', source: '0', target: '2', label: '-3(3)-'});
g.edges.push({id: '104', source: '10', target: '0', label: '-4(4)-'});
g.edges.push({id: '105', source: '1', target: '9', label: '-4(4)-'});
g.edges.push({id: '106', source: '2', target: '11', label: '-4(4)-'});
g.edges.push({id: '107', source: '9', target: '11', label: '-3(3)-'});
g.edges.push({id: '108', source: '11', target: '10', label: '-3(3)-'});
g.edges.push({id: '109', source: '10', target: '9', label: '-3(3)-'});

// Instantiate sigma:
s = new sigma({
  graph: g,
  container: 'graph-container'
});

// Some settings for global graph style
s.settings({
  sideMargin: 1,
  maxNodeSize: 50,
  labelSize: 'proportional',
  labelSizeRatio: 0.5,
  defaultNodeColor: '#9999ff',
  edgeColor: 'default',
  defaultEdgeColor: '#ccc' 
});

// Refresh the graph to see the changes
s.refresh();

This means a single line for each node and edge, which can be implemented in two functions, one to print a node as a JS line and one to print edge as a JS line.

2hda_11_sigma-js

The overall appearance is good but there are some missing features (or I could not find the appropiate settings), like inserting the labels inside the nodes or showing the edge labels. However, it has to be considered that the graph is interactive and labels appear when mousing the elements and the graph can be moved and zoomed very intuitively with the mouse.

lafita commented 8 years ago

In conclusion, we need to decide between converting the graph into a DOT format and rendering it into a vector graphics image (using viz.js) or showing the graph with JavaScript as an interactive object (sigma.js). What do you prefer/comment @sbliven, @josemduarte and @gcapitani?

josemduarte commented 8 years ago

I'd be happiest with a client-based solution (i.e. javascript). It's the easiest for us to implement and it also has the advantage that it is a lot easier to change something on it. With a server-side solution a change in the code would require a new precalculation, whilst with a client-side solution it's simply a matter of pushing a new web-app version.

Regarding interactivity of the JS solutions, I don't see that as an essential feature for us. Things like zooming would be nice but things like dragging nodes around are not very useful in our case. Anyway I think all that should be configurable in the JS code (by choosing to implement or not listeners for the events).

lafita commented 8 years ago

vis.js has nice options that can be useful to us, like highlighting the edges of a node when selected or showing a message when mousing over the node. In addition, the interactive features (like dragging nodes) can be easily disabled. The style of the graphs is very nice with the default settings, and coloring the nodes differently (by chain ID) is easy. The documentation is similar or even better than the one from sigma.js.

Here you can see an example of a thumbnail D3 graph: http://jsbin.com/solumil/edit?html,output Here the an example of a detailed D3 graph: http://jsbin.com/wocima/1/edit?html,output

gcapitani commented 8 years ago

The above D3 graph examples look very good to me; the possibility of showing text when the user mouses over a node, though not essential, offers an intuitive way to convey additional info.

lafita commented 8 years ago

vis.js uses only HTML canvas for rendering, while sigma.js can use HTML canvas and WebGL if enabled. We do not need the WebGL capabilities for the 2D graph, because the graphs are small and we do not want most of the interactive capabilities.

I tried showing 20 thumbnails of the D3 vis.js graph in the same page and it loads fast and responds well to scrolling and interaction: http://output.jsbin.com/wapoxu

It is, thus, feasible to show the thumbnails in JS as HTML canvas. We need to evaluate how much difficult is it to insert those instead of images and how much computation does it require to calculate them on the user side (may be a lot for large number of assemblies if we show them all in the same page).

josemduarte commented 8 years ago

We need to evaluate how much difficult is it to insert those instead of images and how much computation does it require to calculate them on the user side (may be a lot for large number of assemblies if we show them all in the same page).

Yes that's the big question, the whole GXT thing is already very heavy and this would only add to it. It would also require some additional data from db to be transferred to the client (the graph info is not transferred at the moment). I suppose we can only answer all that if we try it out...

If we decided to generate pngs to avoid all that, we'd then need to use some other software, wouldn't we?

sbliven commented 8 years ago

If we want to create PNG images server-side from vis or any of the other javascript implementations, we will need to install a javascript interpreter on the server (or on Merlin for precomputation). I've mostly seen Node.js and Rhino discussed.

Somewhat releated: LinkedIn had an interesting blog post on their decision to use a javascript templating engine (dust.js), which they use both server-side and client-side depending on the use case and the client's capabilities. They performed an extensive comparison of 26 templating libraries (I was happy to see that our current choice, Mustache, is in the top tier).

sbliven commented 8 years ago

Partially solved by #115, but still needs some work:

sbliven commented 8 years ago

Performance problems mitigated by #116

josemduarte commented 7 years ago

What remains to be done (optional): generating the json files in the CLI run, so that the files can be precomputed.

We should consider implementing the precomputing feature as low priority. In the end of the day with this caching solution things work and only affect the speed of the first run. A possible issue can be that the computation on-the-fly is quite heavy and if we had many users it could become a drain of resources for the server. Until now the server was very light weight, but with this caching solution we introduce a not so light process.

josemduarte commented 7 years ago

I'm working on the json files generation from CLI.

josemduarte commented 7 years ago

Json files are now generated in cli. I'll close this.