GuyAllard / markov_clustering

markov clustering in python
MIT License
167 stars 37 forks source link

How does MCL treat weight of edges? #1

Closed hw449 closed 6 years ago

hw449 commented 6 years ago

Thank you for creating such a wonder tool for our community!

I have a file containing in each line edge information (for example: nodeX, nodeY, weight). The higher the weight, the shorter the distance between nodeX and nodeY. I converted this file into a matrix using networkx (add_weighted_edges_from and to_scipy_sparse_matrix function). I am wondering how does MCL treat the weights of edges?

Thank you very much.

Regards, Hai

GuyAllard commented 6 years ago

Hi Hai,

The MCL algorithm treats the edge weights as similarities.
The absolute magnitude of the edge weights is not important, but the relative magnitude is.
If the graph has an edge (a, b) with a weight 2, and another edge (c, d) with weight 20, then c and d are considered 10x more similar than (a,b).

Your graph, where a higher edge weight indicates a shorter distance should work.

Hope that helps

Guy

Mataivic commented 5 years ago

Hello, How about a network with negative and positive weights ? My network models ecological interactions between species, but these interactions can be conflicting (negative weights) or benefical (positive weight).

Moonire commented 5 years ago

The idea behind MCL is to simulate a random walk ad infinitum and than looking where the random walker starting as some node 'i' will tend to go more often. This way groups of strongly connected nodes tend to be in the same cluster after you run MCL. The weights on the edges can then be thought of (once normalized) as transition probabilities from node 'i' to node 'j', and probabilities can't be negative. So right of the bat you break some basic assumption the algorithm makes by allowing negative values.

If you input an adjacency matrix with negative values to the algorithm, the normalization step will make it so that the sum of the absolute values of any column will add up to one, and not the actual values, which again is breaking a very core assumption the algorithm makes.

You can give it a try but the results would be either nonsensical (which is my guess) or need a brand new interpretation under the new assumptions you've made (and that's if you rework the code to accommodate pruning negative values and stuff like that).

I'd go for the easiest work around and softmax the values of your matrix. You juste have to exponentiate the weights and the algorithm will normalize them for you so less work! That way the negative weights will correspond to very small transition probabilities and MCL would be perfectly usable.

Mataivic commented 5 years ago

@Moonire OK so I have to deal with that.

Moonire commented 5 years ago

@Mataivic Simply adding 1 to all weights would be a bad idea (it deletes a -1 weighted edge from the graph). But if you can work in the positif subset only, then I guess it would solve the problem.

Although I'd like to say having negative weights in the 1st place is an indicator that MCL isn't really what you'er looking for. This got a little far from the topic of this issue so if you want to discuss it in further depth I invite you to email me directly.

Mataivic commented 5 years ago

@Moonire Yes, I had some doubts so I already started to do a review of various clustering algorithms to find the most suited to my case. I'll keep you in touch if you want.

Moonire commented 5 years ago

Yeah absolutely ! I'd love to know what solution you find and discuss the problem in further detail.