Closed horaciobam closed 3 years ago
More common summary statistics used for Network analysis:
Note: We should measure number of overlapping edges in relation with adjacent edges for all the network and for each node.
From our .dot
plot we can differentiate adjacent from overlapping edges based on color:
digraph {
graph [outputorder=endgesfirst]
1 [color="#F8766D" fixedsize=true fontname="Courier-Bold" fontsize=85 height=2.7284509239574835 style=filled width=2.7284509239574835]
1 -> 2 [arrowsize=0.01 color=grey76 len=10 penwidth=22]
1 -> 3 [arrowsize=0.01 color=grey76 len=10 penwidth=6]
1 -> 12 [arrowsize=0.01 color=grey76 len=10 penwidth=5]
1 -> 10 [arrowsize=0.01 color=grey76 len=10 penwidth=4]
1 -> 1 [arrowsize=0.01 color=grey76 len=10 penwidth=3]
1 -> 6 [arrowsize=0.01 color=grey76 len=10 penwidth=2]
1 -> 8 [arrowsize=0.01 color=grey76 len=10 penwidth=2]
1 -> 5 [arrowsize=0.01 color=grey76 len=10 penwidth=12]
1 -> 4 [arrowsize=0.01 color=grey76 len=10 penwidth=3]
1 -> 11 [arrowsize=0.01 color=grey76 len=10 penwidth=1]
1 -> 7 [arrowsize=0.01 color=grey76 len=10 penwidth=2]
1 -> 13 [arrowsize=0.01 color=grey76 len=10 penwidth=1]
1 -> 7 [arrowsize=0.01 color="#143D59" len=10 penwidth=1]
1 -> 1 [arrowsize=0.01 color="#143D59" len=10 penwidth=1]
1 -> 10 [arrowsize=0.01 color="#143D59" len=10 penwidth=1]
2 [color="#E18A00" fixedsize=true fontname="Courier-Bold" fontsize=85 height=3.5433819375782165 style=filled width=3.5433819375782165]
2 -> 1 [arrowsize=0.01 color=grey76 len=10 penwidth=13]
2 -> 7 [arrowsize=0.01 color=grey76 len=10 penwidth=7]
2 -> 3 [arrowsize=0.01 color=grey76 len=10 penwidth=12]
2 -> 8 [arrowsize=0.01 color=grey76 len=10 penwidth=13]
2 -> 10 [arrowsize=0.01 color=grey76 len=10 penwidth=10]
2 -> 4 [arrowsize=0.01 color=grey76 len=10 penwidth=7]
2 -> 12 [arrowsize=0.01 color=grey76 len=10 penwidth=28]
2 -> 13 [arrowsize=0.01 color=grey76 len=10 penwidth=10]
2 -> 11 [arrowsize=0.01 color=grey76 len=10 penwidth=2]
2 -> 2 [arrowsize=0.01 color=grey76 len=10 penwidth=6]
2 -> 9 [arrowsize=0.01 color=grey76 len=10 penwidth=3]
2 -> 6 [arrowsize=0.01 color=grey76 len=10 penwidth=2]
2 -> 1 [arrowsize=0.01 color="#143D59" len=10 penwidth=1]
2 -> 10 [arrowsize=0.01 color="#143D59" len=10 penwidth=1]
I can also get the statistics directly from the new_viz_ovrf.py
script itself.
Try using NetworkX
Maybe use graph kernels?
Mean kmer distance (> mean(upper.tri(km))
):
Family | Mean kmer distance | Baltimore group |
---|---|---|
Adenoviridae | 0.4997828 | dsDNA |
Coronaviridae | 0.4991135 | (+) ssRNA |
Mononegavirales | 0.4997619 | (-) ssRNA |
Reoviridae | 0.4994664 | dsRNA |
Retroviridae | 0.4985549 | Retrovirus |
Rhabdoviridae | 0.4995567 | (-) ssRNA |
Double check the results. Are data files correct? Interpreted as double instead of integers. Take upper.tri and get a limited part of it. Index the matrix itself.
Corrected mean kmer distance by using:
for (d in All) {
up <- upper.tri(d)
m <- mean(d[up])
print(m)
}
Family | Mean kmer distance | Baltimore group | Host | Number of complete ref genomes |
---|---|---|---|---|
Adenoviridae | 0.2317234 | dsDNA | Human, Non-human vertebrate | 72 |
Coronaviridae | 0.2101119 | (+) ssRNA | Human, Non-human vertebrate | 65 |
Mononegavirales | 0.2559121 | (-) ssRNA | Human, Non-human vertebrate | 327 |
Reoviridae | 0.2877149 | dsRNA | Human, Non-human vertebrate, invertebrate, plants | 887 |
Retroviridae | 0.26761 | Retrovirus | Human, Non-human vertebrate | 93 |
Rhabdoviridae | 0.2489143 | (-) ssRNA | Human, Non-human vertebrate, invertebrate, plants | 82 |
Full Table with all evaluated families in fam_analysis.csv
on the data
folder
Undirected or directed.