PoonLab / ovrf-viz

Review article on overlapping reading frames in viruses
MIT License
2 stars 0 forks source link

Improving adjacency graphs #20

Open horaciobam opened 4 years ago

horaciobam commented 4 years ago

Cluster subgraphs: How would they look like?

ArtPoon commented 4 years ago

https://www.graphviz.org/Gallery/directed/cluster.html

ArtPoon commented 4 years ago

Whiteboard 1 -01

horaciobam commented 4 years ago

Using subgraphs I managed to get this (awful) output for the Coronaviridae family:

image

I tried using the fdp engine and I got:

image

:(

Edges look very crowded (especially the thick ones representing clusters with more connections) and this engine won't allow me to use arrows. @ArtPoon, do you have any recommendations on how to improve it?

ArtPoon commented 4 years ago

Let's see how this looks with edges filtered, i.e., removing edges with low weights. thanks!

horaciobam commented 4 years ago

Same dataset (coronaviridae) with an edge_count of 5:

image

ArtPoon commented 4 years ago

Can you send me your current DOT file please?

horaciobam commented 4 years ago

Dot file for this graph:

// Cluster plot
graph G {
    compound=true
    subgraph cluster_1 {
        node [color=white style=filled]
        color="#f77189" fontname="Courier-Bold" label=cluster_1 style=filled
        start1 -- end1
    }
    start1 -- end2 [arrowsize=0.01098901098901099 color=grey76 len=8 penwidth=91]
    start1 -- end6 [arrowsize=0.0625 color=grey76 len=8 penwidth=16]
    start1 -- end8 [arrowsize=0.125 color=grey76 len=8 penwidth=8]
    start1 -- end7 [arrowsize=0.16666666666666666 color=grey76 len=8 penwidth=6]
    start1 -- end9 [arrowsize=0.14285714285714285 color=grey76 len=8 penwidth=7]
    start1 -- end10 [arrowsize=0.08333333333333333 color=grey76 len=8 penwidth=12]
    start1 -- end11 [arrowsize=0.16666666666666666 color=grey76 len=8 penwidth=6]
    start1 -- end1 [arrowsize=0.009174311926605505 color="#143D59" len=8 penwidth=109]
    start1 -- end2 [arrowsize=0.024390243902439025 color="#143D59" len=8 penwidth=41]
    subgraph cluster_2 {
        node [color=white style=filled]
        color="#e18632" fontname="Courier-Bold" label=cluster_2 style=filled
        start2 -- end2
    }
    start2 -- end3 [arrowsize=0.16666666666666666 color=grey76 len=8 penwidth=6]
    start2 -- end8 [arrowsize=0.041666666666666664 color=grey76 len=8 penwidth=24]
    start2 -- end9 [arrowsize=0.16666666666666666 color=grey76 len=8 penwidth=6]
    start2 -- end6 [arrowsize=0.125 color=grey76 len=8 penwidth=8]
    start2 -- end7 [arrowsize=0.1 color=grey76 len=8 penwidth=10]
    start2 -- end11 [arrowsize=0.16666666666666666 color=grey76 len=8 penwidth=6]
    start2 -- end6 [arrowsize=0.05 color="#143D59" len=8 penwidth=20]
    start2 -- end8 [arrowsize=0.16666666666666666 color="#143D59" len=8 penwidth=6]
    start2 -- end10 [arrowsize=0.1 color="#143D59" len=8 penwidth=10]
    subgraph cluster_3 {
        node [color=white style=filled]
        color="#b59a32" fontname="Courier-Bold" label=cluster_3 style=filled
        start3 -- end3
    }
    start3 -- end4 [arrowsize=0.1 color=grey76 len=8 penwidth=10]
    start3 -- end8 [arrowsize=0.2 color=grey76 len=8 penwidth=5]
    start3 -- end5 [arrowsize=0.07692307692307693 color=grey76 len=8 penwidth=13]
    start3 -- end5 [arrowsize=0.2 color="#143D59" len=8 penwidth=5]
    subgraph cluster_4 {
        node [color=white style=filled]
        color="#8ba731" fontname="Courier-Bold" label=cluster_4 style=filled
        start4 -- end4
    }
    start4 -- end5 [arrowsize=0.022222222222222223 color=grey76 len=8 penwidth=45]
    start4 -- end8 [arrowsize=0.125 color=grey76 len=8 penwidth=8]
    start4 -- end2 [arrowsize=0.125 color=grey76 len=8 penwidth=8]
    subgraph cluster_5 {
        node [color=white style=filled]
        color="#32b258" fontname="Courier-Bold" label=cluster_5 style=filled
        start5 -- end5
    }
    start5 -- end3 [arrowsize=0.07692307692307693 color=grey76 len=8 penwidth=13]
    start5 -- end7 [arrowsize=0.1111111111111111 color=grey76 len=8 penwidth=9]
    subgraph cluster_6 {
        node [color=white style=filled]
        color="#35ae95" fontname="Courier-Bold" label=cluster_6 style=filled
        start6 -- end6
    }
    start6 -- end4 [arrowsize=0.058823529411764705 color=grey76 len=8 penwidth=17]
    start6 -- end8 [arrowsize=0.2 color=grey76 len=8 penwidth=5]
    start6 -- end3 [arrowsize=0.16666666666666666 color=grey76 len=8 penwidth=6]
    start6 -- end10 [arrowsize=0.16666666666666666 color=grey76 len=8 penwidth=6]
    start6 -- end8 [arrowsize=0.07692307692307693 color="#143D59" len=8 penwidth=13]
    subgraph cluster_7 {
        node [color=white style=filled]
        color="#37abb2" fontname="Courier-Bold" label=cluster_7 style=filled
        start7 -- end7
    }
    start7 -- end5 [arrowsize=0.09090909090909091 color=grey76 len=8 penwidth=11]
    start7 -- end6 [arrowsize=0.16666666666666666 color=grey76 len=8 penwidth=6]
    start7 -- end3 [arrowsize=0.1111111111111111 color="#143D59" len=8 penwidth=9]
    subgraph cluster_8 {
        node [color=white style=filled]
        color="#39a7d6" fontname="Courier-Bold" label=cluster_8 style=filled
        start8 -- end8
    }
    start8 -- end9 [arrowsize=0.14285714285714285 color=grey76 len=8 penwidth=7]
    start8 -- end4 [arrowsize=0.03571428571428571 color=grey76 len=8 penwidth=28]
    start8 -- end5 [arrowsize=0.16666666666666666 color=grey76 len=8 penwidth=6]
    start8 -- end8 [arrowsize=0.2 color=grey76 len=8 penwidth=5]
    start8 -- end3 [arrowsize=0.16666666666666666 color=grey76 len=8 penwidth=6]
    start8 -- end9 [arrowsize=0.16666666666666666 color="#143D59" len=8 penwidth=6]
    start8 -- end3 [arrowsize=0.16666666666666666 color="#143D59" len=8 penwidth=6]
    subgraph cluster_9 {
        node [color=white style=filled]
        color="#8f93f4" fontname="Courier-Bold" label=cluster_9 style=filled
        start9 -- end9
    }
    start9 -- end4 [arrowsize=0.07692307692307693 color=grey76 len=8 penwidth=13]
    start9 -- end10 [arrowsize=0.16666666666666666 color="#143D59" len=8 penwidth=6]
    subgraph cluster_10 {
        node [color=white style=filled]
        color="#db70f4" fontname="Courier-Bold" label=cluster_10 style=filled
        start10 -- end10
    }
    start10 -- end4 [arrowsize=0.08333333333333333 color=grey76 len=8 penwidth=12]
    start10 -- end3 [arrowsize=0.1111111111111111 color=grey76 len=8 penwidth=9]
    start10 -- end7 [arrowsize=0.09090909090909091 color="#143D59" len=8 penwidth=11]
    subgraph cluster_11 {
        node [color=white style=filled]
        color="#f667c6" fontname="Courier-Bold" label=cluster_11 style=filled
        start11 -- end11
    }
    start11 -- end3 [arrowsize=0.2 color=grey76 len=8 penwidth=5]
    start11 -- end11 [arrowsize=0.2 color=grey76 len=8 penwidth=5]
    start11 -- end6 [arrowsize=0.2 color=grey76 len=8 penwidth=5]
    start11 -- end7 [arrowsize=0.2 color="#143D59" len=8 penwidth=5]
    overlap=false
}
ArtPoon commented 4 years ago

Thanks - what do arrow sizes correspond to? and penwidth?

ArtPoon commented 4 years ago

Actually if you can give me node and edge lists before they are converted into a DOT file that would be ideal

horaciobam commented 4 years ago

arrowsize is used for the visualization without subgraphs, penwidth is used to determine the width of the edges.

With node and edge lists do you mean something like this?:

Nodes: [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]
Adjacent edges: [(1, 2), (1, 1), (1, 6), (1, 8), (1, 7), (1, 4), (1, 9), (1, 10), (1, 11), (1, 3), (2, 3), (2, 4), (2, 8), (2, 9), (2, 10), (2, 6), (2, 7), (2, 11), (2, 2), (3, 4), (3, 8), (3, 3), (3, 5), (3, 10), (3, 7), (3, 9), (4, 5), (4, 8), (4, 2), (4, 9), (4, 6), (5, 6), (5, 3), (5, 10), (5, 11), (5, 7), (5, 8), (5, 9), (6, 4), (6, 8), (6, 3), (6, 10), (6, 5), (7, 4), (7, 5), (7, 6), (7, 8), (7, 11), (7, 9), (8, 9), (8, 4), (8, 5), (8, 8), (8, 3), (8, 6), (8, 11), (8, 10), (9, 8), (9, 3), (9, 10), (9, 4), (9, 6), (9, 5), (9, 9), (10, 4), (10, 3), (10, 8), (10, 5), (10, 7), (11, 3), (11, 11), (11, 6), (11, 7), (11, 8)]
Overlapping edges:[(1, 1), (1, 2), (2, 6), (2, 8), (2, 10), (2, 9), (2, 11), (3, 5), (3, 9), (3, 8), (3, 3), (3, 10), (3, 4), (4, 3), (4, 9), (4, 7), (5, 11), (5, 3), (5, 8), (5, 7), (6, 7), (6, 8), (6, 3), (6, 6), (7, 3), (7, 5), (7, 9), (7, 4), (8, 8), (8, 9), (8, 3), (8, 11), (9, 4), (9, 9), (9, 10), (9, 7), (9, 8), (10, 6), (10, 7), (10, 4), (10, 8), (11, 7), (11, 9), (11, 10)]
ArtPoon commented 4 years ago

No I mean what are you using arrowsize and penwidth to represent?

ArtPoon commented 4 years ago

i.e., I'd like the cluster sizes to go with the node list, and any other node attributes

horaciobam commented 4 years ago

Oh, sorry I didn't understand.

I use pendwith to represent the number of proteins that form and edge between those clusters. Same for arrowsize to make it proportional to the width of the line.

This is the list with node and edge information. I am colouring the clusters similar to the output from the t-SNE clustering plot.

# Nodes: (name_of_node, size_of node, color)
Nodes: [('1', 153, '#f77189'), ('2', 66, '#e18632'), ('3', 55, '#b59a32'), ('4', 65, '#8ba731'), ('5', 66, '#32b258'), ('6', 37, '#35ae95'), ('7', 36, '#37abb2'), ('8', 59, '#39a7d6'), ('9', 32, '#8f93f4'), ('10', 34, '#db70f4'), ('11', 20, '#f667c6')]

# Adjacent edges (current_cluster, adj_cluster, number_of_proteins)
Adjacent edges: [('1', '2', 91), ('1', '1', 2), ('1', '6', 16), ('1', '8', 8), ('1', '7', 6), ('1', '4', 4), ('1', '9', 7), ('1', '10', 12), ('1', '11', 6), ('1', '3', 1), ('2', '3', 6), ('2', '4', 2), ('2', '8', 24), ('2', '9', 6), ('2', '10', 3), ('2', '6', 8), ('2', '7', 10), ('2', '11', 6), ('2', '2', 1), ('3', '4', 10), ('3', '8', 5), ('3', '3', 4), ('3', '5', 13), ('3', '10', 3), ('3', '7', 3), ('3', '9', 1), ('4', '5', 45), ('4', '8', 8), ('4', '2', 8), ('4', '9', 2), ('4', '6', 1), ('5', '6', 1), ('5', '3', 13), ('5', '10', 2), ('5', '11', 4), ('5', '7', 9), ('5', '8', 4), ('5', '9', 1), ('6', '4', 17), ('6', '8', 5), ('6', '3', 6), ('6', '10', 6), ('6', '5', 1), ('7', '4', 3), ('7', '5', 11), ('7', '6', 6), ('7', '8', 1), ('7', '11', 1), ('7', '9', 2), ('8', '9', 7), ('8', '4', 28), ('8', '5', 6), ('8', '8', 5), ('8', '3', 6), ('8', '6', 1), ('8', '11', 1), ('8', '10', 1), ('9', '8', 2), ('9', '3', 4), ('9', '10', 3), ('9', '4', 13), ('9', '6', 2), ('9', '5', 4), ('9', '9', 2), ('10', '4', 12), ('10', '3', 9), ('10', '8', 2), ('10', '5', 2), ('10', '7', 1), ('11', '3', 5), ('11', '11', 5), ('11', '6', 5), ('11', '7', 2), ('11', '8', 1)]

# Overlapping edges (current_cluster, ovp_cluster, number_of_proteins)
Overlapping edges:[('1', '1', 109), ('1', '2', 41), ('2', '6', 20), ('2', '8', 6), ('2', '10', 10), ('2', '9', 1), ('2', '11', 1), ('3', '5', 5), ('3', '9', 1), ('3', '8', 4), ('3', '3', 2), ('3', '10', 3), ('3', '4', 2), ('4', '3', 3), ('4', '9', 1), ('4', '7', 1), ('5', '11', 1), ('5', '3', 1), ('5', '8', 1), ('5', '7', 1), ('6', '7', 1), ('6', '8', 13), ('6', '3', 4), ('6', '6', 1), ('7', '3', 9), ('7', '5', 1), ('7', '9', 1), ('7', '4', 1), ('8', '8', 4), ('8', '9', 6), ('8', '3', 6), ('8', '11', 2), ('9', '4', 4), ('9', '9', 3), ('9', '10', 6), ('9', '7', 3), ('9', '8', 1), ('10', '6', 1), ('10', '7', 11), ('10', '4', 2), ('10', '8', 1), ('11', '7', 5), ('11', '9', 2), ('11', '10', 1)]
ArtPoon commented 4 years ago

Ok thanks!

ArtPoon commented 4 years ago

Wrote some JavaScript to generate the following with d3:

Screen Shot 2020-10-07 at 9 45 04 PM

Please find code in test.html and test.json with commit f6f0c275090f227e6c8cd66ca108fa82ce30768c

ArtPoon commented 4 years ago

Some issues:

ArtPoon commented 4 years ago

Oh, and you should write a Python or R script to convert your data into the JSON format for this animation :-)

horaciobam commented 4 years ago

Understood, working on it. Thanks!

horaciobam commented 4 years ago

@ArtPoon I have two questions:

  1. On the JSON file that you generated from the edge list, the start of each cluster is connected with the end of the adjacent one (for example: {"source": "start2", "target": "end3"}). Shouldn't it be end2 -> start3?
  2. I think you did not include the information of the overlapping proteins. Should I also include those as links and color them differently? (The downside is that if I do, the visualization gets more crowded, but maybe I can relax the force between links):

image

Thank you!

ArtPoon commented 4 years ago
  1. Oh whoops, you're right - start is 5' end so it should be the terminus of the edge
  2. I did not include overlap information - I thought we could draw those in later as different kinds of edges (color?), and have a button in the user interface toggle the overlap edges on and off.
horaciobam commented 4 years ago

With last commit, colors for links and nodes now look like this:

image

Still trying to make the slider to filter links with low count to work

horaciobam commented 4 years ago

image Slider is working properly but for some reason is overlapping with the genome plot.

horaciobam commented 4 years ago

Progress update:

image

I think I need to create a panel to display the currently selected family, and I would like to make the page to resize according to window size. Should we also display information about the group (i.e. number of species, total number of clusters, size of clusters)

ArtPoon commented 4 years ago

Yes that info would be helpful for users You should also label the slider to indicate what the numbers represent

horaciobam commented 3 years ago