Open horaciobam opened 4 years ago
Using subgraphs
I managed to get this (awful) output for the Coronaviridae family:
I tried using the fdp
engine and I got:
:(
Edges look very crowded (especially the thick ones representing clusters with more connections) and this engine won't allow me to use arrows. @ArtPoon, do you have any recommendations on how to improve it?
Let's see how this looks with edges filtered, i.e., removing edges with low weights. thanks!
Same dataset (coronaviridae) with an edge_count
of 5:
Can you send me your current DOT file please?
Dot file for this graph:
// Cluster plot
graph G {
compound=true
subgraph cluster_1 {
node [color=white style=filled]
color="#f77189" fontname="Courier-Bold" label=cluster_1 style=filled
start1 -- end1
}
start1 -- end2 [arrowsize=0.01098901098901099 color=grey76 len=8 penwidth=91]
start1 -- end6 [arrowsize=0.0625 color=grey76 len=8 penwidth=16]
start1 -- end8 [arrowsize=0.125 color=grey76 len=8 penwidth=8]
start1 -- end7 [arrowsize=0.16666666666666666 color=grey76 len=8 penwidth=6]
start1 -- end9 [arrowsize=0.14285714285714285 color=grey76 len=8 penwidth=7]
start1 -- end10 [arrowsize=0.08333333333333333 color=grey76 len=8 penwidth=12]
start1 -- end11 [arrowsize=0.16666666666666666 color=grey76 len=8 penwidth=6]
start1 -- end1 [arrowsize=0.009174311926605505 color="#143D59" len=8 penwidth=109]
start1 -- end2 [arrowsize=0.024390243902439025 color="#143D59" len=8 penwidth=41]
subgraph cluster_2 {
node [color=white style=filled]
color="#e18632" fontname="Courier-Bold" label=cluster_2 style=filled
start2 -- end2
}
start2 -- end3 [arrowsize=0.16666666666666666 color=grey76 len=8 penwidth=6]
start2 -- end8 [arrowsize=0.041666666666666664 color=grey76 len=8 penwidth=24]
start2 -- end9 [arrowsize=0.16666666666666666 color=grey76 len=8 penwidth=6]
start2 -- end6 [arrowsize=0.125 color=grey76 len=8 penwidth=8]
start2 -- end7 [arrowsize=0.1 color=grey76 len=8 penwidth=10]
start2 -- end11 [arrowsize=0.16666666666666666 color=grey76 len=8 penwidth=6]
start2 -- end6 [arrowsize=0.05 color="#143D59" len=8 penwidth=20]
start2 -- end8 [arrowsize=0.16666666666666666 color="#143D59" len=8 penwidth=6]
start2 -- end10 [arrowsize=0.1 color="#143D59" len=8 penwidth=10]
subgraph cluster_3 {
node [color=white style=filled]
color="#b59a32" fontname="Courier-Bold" label=cluster_3 style=filled
start3 -- end3
}
start3 -- end4 [arrowsize=0.1 color=grey76 len=8 penwidth=10]
start3 -- end8 [arrowsize=0.2 color=grey76 len=8 penwidth=5]
start3 -- end5 [arrowsize=0.07692307692307693 color=grey76 len=8 penwidth=13]
start3 -- end5 [arrowsize=0.2 color="#143D59" len=8 penwidth=5]
subgraph cluster_4 {
node [color=white style=filled]
color="#8ba731" fontname="Courier-Bold" label=cluster_4 style=filled
start4 -- end4
}
start4 -- end5 [arrowsize=0.022222222222222223 color=grey76 len=8 penwidth=45]
start4 -- end8 [arrowsize=0.125 color=grey76 len=8 penwidth=8]
start4 -- end2 [arrowsize=0.125 color=grey76 len=8 penwidth=8]
subgraph cluster_5 {
node [color=white style=filled]
color="#32b258" fontname="Courier-Bold" label=cluster_5 style=filled
start5 -- end5
}
start5 -- end3 [arrowsize=0.07692307692307693 color=grey76 len=8 penwidth=13]
start5 -- end7 [arrowsize=0.1111111111111111 color=grey76 len=8 penwidth=9]
subgraph cluster_6 {
node [color=white style=filled]
color="#35ae95" fontname="Courier-Bold" label=cluster_6 style=filled
start6 -- end6
}
start6 -- end4 [arrowsize=0.058823529411764705 color=grey76 len=8 penwidth=17]
start6 -- end8 [arrowsize=0.2 color=grey76 len=8 penwidth=5]
start6 -- end3 [arrowsize=0.16666666666666666 color=grey76 len=8 penwidth=6]
start6 -- end10 [arrowsize=0.16666666666666666 color=grey76 len=8 penwidth=6]
start6 -- end8 [arrowsize=0.07692307692307693 color="#143D59" len=8 penwidth=13]
subgraph cluster_7 {
node [color=white style=filled]
color="#37abb2" fontname="Courier-Bold" label=cluster_7 style=filled
start7 -- end7
}
start7 -- end5 [arrowsize=0.09090909090909091 color=grey76 len=8 penwidth=11]
start7 -- end6 [arrowsize=0.16666666666666666 color=grey76 len=8 penwidth=6]
start7 -- end3 [arrowsize=0.1111111111111111 color="#143D59" len=8 penwidth=9]
subgraph cluster_8 {
node [color=white style=filled]
color="#39a7d6" fontname="Courier-Bold" label=cluster_8 style=filled
start8 -- end8
}
start8 -- end9 [arrowsize=0.14285714285714285 color=grey76 len=8 penwidth=7]
start8 -- end4 [arrowsize=0.03571428571428571 color=grey76 len=8 penwidth=28]
start8 -- end5 [arrowsize=0.16666666666666666 color=grey76 len=8 penwidth=6]
start8 -- end8 [arrowsize=0.2 color=grey76 len=8 penwidth=5]
start8 -- end3 [arrowsize=0.16666666666666666 color=grey76 len=8 penwidth=6]
start8 -- end9 [arrowsize=0.16666666666666666 color="#143D59" len=8 penwidth=6]
start8 -- end3 [arrowsize=0.16666666666666666 color="#143D59" len=8 penwidth=6]
subgraph cluster_9 {
node [color=white style=filled]
color="#8f93f4" fontname="Courier-Bold" label=cluster_9 style=filled
start9 -- end9
}
start9 -- end4 [arrowsize=0.07692307692307693 color=grey76 len=8 penwidth=13]
start9 -- end10 [arrowsize=0.16666666666666666 color="#143D59" len=8 penwidth=6]
subgraph cluster_10 {
node [color=white style=filled]
color="#db70f4" fontname="Courier-Bold" label=cluster_10 style=filled
start10 -- end10
}
start10 -- end4 [arrowsize=0.08333333333333333 color=grey76 len=8 penwidth=12]
start10 -- end3 [arrowsize=0.1111111111111111 color=grey76 len=8 penwidth=9]
start10 -- end7 [arrowsize=0.09090909090909091 color="#143D59" len=8 penwidth=11]
subgraph cluster_11 {
node [color=white style=filled]
color="#f667c6" fontname="Courier-Bold" label=cluster_11 style=filled
start11 -- end11
}
start11 -- end3 [arrowsize=0.2 color=grey76 len=8 penwidth=5]
start11 -- end11 [arrowsize=0.2 color=grey76 len=8 penwidth=5]
start11 -- end6 [arrowsize=0.2 color=grey76 len=8 penwidth=5]
start11 -- end7 [arrowsize=0.2 color="#143D59" len=8 penwidth=5]
overlap=false
}
Thanks - what do arrow sizes correspond to? and penwidth?
Actually if you can give me node and edge lists before they are converted into a DOT file that would be ideal
arrowsize
is used for the visualization without subgraphs, penwidth
is used to determine the width of the edges.
With node and edge lists do you mean something like this?:
Nodes: [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]
Adjacent edges: [(1, 2), (1, 1), (1, 6), (1, 8), (1, 7), (1, 4), (1, 9), (1, 10), (1, 11), (1, 3), (2, 3), (2, 4), (2, 8), (2, 9), (2, 10), (2, 6), (2, 7), (2, 11), (2, 2), (3, 4), (3, 8), (3, 3), (3, 5), (3, 10), (3, 7), (3, 9), (4, 5), (4, 8), (4, 2), (4, 9), (4, 6), (5, 6), (5, 3), (5, 10), (5, 11), (5, 7), (5, 8), (5, 9), (6, 4), (6, 8), (6, 3), (6, 10), (6, 5), (7, 4), (7, 5), (7, 6), (7, 8), (7, 11), (7, 9), (8, 9), (8, 4), (8, 5), (8, 8), (8, 3), (8, 6), (8, 11), (8, 10), (9, 8), (9, 3), (9, 10), (9, 4), (9, 6), (9, 5), (9, 9), (10, 4), (10, 3), (10, 8), (10, 5), (10, 7), (11, 3), (11, 11), (11, 6), (11, 7), (11, 8)]
Overlapping edges:[(1, 1), (1, 2), (2, 6), (2, 8), (2, 10), (2, 9), (2, 11), (3, 5), (3, 9), (3, 8), (3, 3), (3, 10), (3, 4), (4, 3), (4, 9), (4, 7), (5, 11), (5, 3), (5, 8), (5, 7), (6, 7), (6, 8), (6, 3), (6, 6), (7, 3), (7, 5), (7, 9), (7, 4), (8, 8), (8, 9), (8, 3), (8, 11), (9, 4), (9, 9), (9, 10), (9, 7), (9, 8), (10, 6), (10, 7), (10, 4), (10, 8), (11, 7), (11, 9), (11, 10)]
No I mean what are you using arrowsize
and penwidth
to represent?
i.e., I'd like the cluster sizes to go with the node list, and any other node attributes
Oh, sorry I didn't understand.
I use pendwith
to represent the number of proteins that form and edge between those clusters. Same for arrowsize
to make it proportional to the width of the line.
This is the list with node and edge information. I am colouring the clusters similar to the output from the t-SNE clustering plot.
# Nodes: (name_of_node, size_of node, color)
Nodes: [('1', 153, '#f77189'), ('2', 66, '#e18632'), ('3', 55, '#b59a32'), ('4', 65, '#8ba731'), ('5', 66, '#32b258'), ('6', 37, '#35ae95'), ('7', 36, '#37abb2'), ('8', 59, '#39a7d6'), ('9', 32, '#8f93f4'), ('10', 34, '#db70f4'), ('11', 20, '#f667c6')]
# Adjacent edges (current_cluster, adj_cluster, number_of_proteins)
Adjacent edges: [('1', '2', 91), ('1', '1', 2), ('1', '6', 16), ('1', '8', 8), ('1', '7', 6), ('1', '4', 4), ('1', '9', 7), ('1', '10', 12), ('1', '11', 6), ('1', '3', 1), ('2', '3', 6), ('2', '4', 2), ('2', '8', 24), ('2', '9', 6), ('2', '10', 3), ('2', '6', 8), ('2', '7', 10), ('2', '11', 6), ('2', '2', 1), ('3', '4', 10), ('3', '8', 5), ('3', '3', 4), ('3', '5', 13), ('3', '10', 3), ('3', '7', 3), ('3', '9', 1), ('4', '5', 45), ('4', '8', 8), ('4', '2', 8), ('4', '9', 2), ('4', '6', 1), ('5', '6', 1), ('5', '3', 13), ('5', '10', 2), ('5', '11', 4), ('5', '7', 9), ('5', '8', 4), ('5', '9', 1), ('6', '4', 17), ('6', '8', 5), ('6', '3', 6), ('6', '10', 6), ('6', '5', 1), ('7', '4', 3), ('7', '5', 11), ('7', '6', 6), ('7', '8', 1), ('7', '11', 1), ('7', '9', 2), ('8', '9', 7), ('8', '4', 28), ('8', '5', 6), ('8', '8', 5), ('8', '3', 6), ('8', '6', 1), ('8', '11', 1), ('8', '10', 1), ('9', '8', 2), ('9', '3', 4), ('9', '10', 3), ('9', '4', 13), ('9', '6', 2), ('9', '5', 4), ('9', '9', 2), ('10', '4', 12), ('10', '3', 9), ('10', '8', 2), ('10', '5', 2), ('10', '7', 1), ('11', '3', 5), ('11', '11', 5), ('11', '6', 5), ('11', '7', 2), ('11', '8', 1)]
# Overlapping edges (current_cluster, ovp_cluster, number_of_proteins)
Overlapping edges:[('1', '1', 109), ('1', '2', 41), ('2', '6', 20), ('2', '8', 6), ('2', '10', 10), ('2', '9', 1), ('2', '11', 1), ('3', '5', 5), ('3', '9', 1), ('3', '8', 4), ('3', '3', 2), ('3', '10', 3), ('3', '4', 2), ('4', '3', 3), ('4', '9', 1), ('4', '7', 1), ('5', '11', 1), ('5', '3', 1), ('5', '8', 1), ('5', '7', 1), ('6', '7', 1), ('6', '8', 13), ('6', '3', 4), ('6', '6', 1), ('7', '3', 9), ('7', '5', 1), ('7', '9', 1), ('7', '4', 1), ('8', '8', 4), ('8', '9', 6), ('8', '3', 6), ('8', '11', 2), ('9', '4', 4), ('9', '9', 3), ('9', '10', 6), ('9', '7', 3), ('9', '8', 1), ('10', '6', 1), ('10', '7', 11), ('10', '4', 2), ('10', '8', 1), ('11', '7', 5), ('11', '9', 2), ('11', '10', 1)]
Ok thanks!
Wrote some JavaScript to generate the following with d3:
Please find code in test.html
and test.json
with commit f6f0c275090f227e6c8cd66ca108fa82ce30768c
Some issues:
Oh, and you should write a Python or R script to convert your data into the JSON format for this animation :-)
Understood, working on it. Thanks!
@ArtPoon I have two questions:
start
of each cluster is connected with the end
of the adjacent one (for example: {"source": "start2", "target": "end3"}
). Shouldn't it be end2 -> start3
?links
and color them differently? (The downside is that if I do, the visualization gets more crowded, but maybe I can relax the force
between links):Thank you!
With last commit, colors for links
and nodes
now look like this:
Still trying to make the slider to filter links
with low count
to work
Slider is working properly but for some reason is overlapping with the genome plot.
Progress update:
I think I need to create a panel to display the currently selected family, and I would like to make the page to resize according to window size. Should we also display information about the group (i.e. number of species, total number of clusters, size of clusters)
Yes that info would be helpful for users You should also label the slider to indicate what the numbers represent
start
and end
labelsCDS
Export SVG
button
Cluster subgraphs: How would they look like?