dirmeier / diffusr

Network diffusion algorithms in R.
GNU General Public License v3.0
21 stars 6 forks source link

undirected or directed? weighted or unweighted? #10

Closed duleise closed 6 years ago

duleise commented 6 years ago

great and neat work!

this is definitely a naive question. but i could not find clear answers from the docs. so just to make sure i use it correctly. the input network for random.walk or heat.diffusion, does it have to be undirected or directed? are the edges weighted or unweighted? thanks.

dirmeier commented 6 years ago

Hey, thanks for the feedback, Ill add some info to the vignette. I definitely should make this clearer.

So, you define a graph using an adjacency matrix, for instance like this:

  graph <- matrix(abs(rnorm(n*n)), n, n)

This would correspond to a directed weighted graph. Every entry graph[i, j] in the matrix defines a weight of an edge in the graph.

If you want to make it undirected, you would just set graph[i, j] = graph[j, i].

If you want to make it also unweighted you would use a binary adjacency matrix, where every entry graph[i, j] = 1 if there is an edge between nodes i and j and 0 otherwise.

However, remember that for random walks we need to ensure that the matrix is left stochastic, so I normalize every column to sum to 1 automatically. This means that even if you use an undirected graph, the transition matrix for it could have different edge weights which makes it directed in that sense.

Hope that clears things up. Otherwise, I am glad to help.

Cheers, Simon

duleise commented 6 years ago

Thank you for such a thorough reply. Yes, it clears things up a lot. To put it simply, can I understand that, the input matrix/graph can be unweighted, but can not be undirected? It makes sense to me because the propagation/flow/walk have to follow specific directions instead of two-way streets. Thanks again. /Otto

dirmeier commented 6 years ago

Yeah, in a sense you are right. But let me clear up what I mean, just to make sure, we are on the same page.

In the end, an undirected graph is a graph where graph[i, j] = graph[j, i], right? So if we have two edges between i and j and they have the same weight, we can interpret it as being a single one, i.e. undirected (see wiki).

For example:

 g <-  matrix(c(0, 1, 1, 0), 2, 2)

would be a graph of two nodes. They share two edges with weight one. Since both have equal weight, we can say it is undirected.

Now, in the directed case, the edge weights would be different between two nodes, for example:

 g <- matrix(c(0, 1, 2, 0), 2, 2)

If you imagine it as being costs you need to take to get from one node to the other, then one time the costs are 2 and one time the costs are 1. So since the costs are different in every direction, we would call it directed.

So, in the end an undirected graph is just a special case of a directed one, where edge weights between the same two nodes are equal.

I am glad my package is of help and hope I could answer this well enough.

Cheers, Simon

duleise commented 6 years ago

Thank you. Yes I agree that an undirected graph is just a special case of a directed one. Coming back to your package, will the functions accept an input like this? matrix(c(0, 1, 0, 0), 2, 2) which theoretically is directed.

Since in my practice, Bioinformatics field, a directed graph is difficult to get, or less common than an undirected one.

Cheers, Otto

dirmeier commented 6 years ago

Depends on the method you use, I think. For a random walk this matrix wouldn't be ergodic, so the Markov chain does not have a stationary distribution, if I am not mistaken. But I'd just try it out and see. The heat equation will work here definitely

duleise commented 6 years ago

I had an error in the example in my last comment. A theoretically directed but in fact an undirected graph is a symmetric matrix. I like visualise data when possible. My question is, will random.walk or heat.diffusion in the following codes make sense? Or as you suggested, in this case heat.diffusion is more advised than random.walk? I might be a bit annoying here, but what I can get as input graph is really limited and your package is seemingly the only manageable tool I can find for this kind of analysis.

require(igraph)
# count of nodes
n <- 8

# starting distribution (has to sum to one)
p0    <- as.vector(rmultinom(1, 1, prob = rep(.4, n)))

# generate a symmetric adjacency matrix with random zeros and ones.
graph <- matrix(sample(c(0,0,0,1,1), n*n, replace = T), n, n)
graph <- graph %*% t(graph) # to make it symmetric
diag(graph) <- 0
graph[which(graph > 0)] <- 1 # to make it unweighted, or equally weighted
isSymmetric(graph)

# computation of stationary distribution
pt <- random.walk(p0, graph)
# or heat dissusion
# pt <- heat.diffusion(p0, graph)

g <- graph_from_adjacency_matrix(adjmatrix = graph, mode = "undirected") %>% simplify()
lo <- layout_with_lgl(g)

par(mfrow = c(1,2))
plot(g, layout = lo,
     edge.arrow.size = .6, edge.curved = .1,
     vertex.color = unlist(lapply(p0, function(k){
       rgb(0, 0, 0, k*255, max = 255)}))
)
plot(g, layout = lo,
     edge.arrow.size = .6, edge.curved = .1,
     vertex.color = unlist(lapply(pt, function(k){
       rgb(0, 0, 0, k*255, max = 255)}))
)

Since the matrix is randomly generated, in some cases you may get totally scarce graph.

dirmeier commented 6 years ago

Sorry, I wasn't clear. I meant, you can only guarantee that your MC has a stationary distribution if it is ergodic. This was not guaranteed in the 2 node example. The 8 node example here is fine, you could use any of the two.

"I might be a bit annoying here, but what I can get as input graph is really limited and your package is seemingly the only manageable tool I can find for this kind of analysis."

Don't worry, glad to help :).

duleise commented 6 years ago

Great! Thanks. Now I can go back to my work and use your package to its most. Been a pleasant visit here :)