Closed qiemem closed 10 years ago
I've confirmed that this has been broken since the snapshot days.
Okay, this is a little complicated. Jung is just doing PageRank with alpha = 0 for its eigenvector centrality measure. PageRank normalizes the columns of the adjacency matrix so that they sum to 1. Eigenvector centrality (I don't believe) doesn't. In Gephi, eigenvector centrality really is calculated by just taking the most significant eigenvector of the adjacency, whereas its pagerank does the normalization first.
Jung is definitely wrong for the network 0--1--2. For 0--1--2--3, it gets the same as Gephi's PageRank, with p=1
(i.e. alpha = 0). So its not wrong... its just different than Gephi.
I can feed it edge weight values that force it to use a non-normalized adjacency matrix. It agrees with Gephi if I do this (up to scaling). The only problem is that the scaling is huge on it. 0--1--2--3 gives [1.432869610034543E20 2.318431730482698E20 2.318431730482698E20 1.432869610034543E20]
, so I'm definitely worried about overflow for large networks.
Here's code that almost does it, and renormalizes the results:
object EigenvectorCentrality extends jungalg.scoring.PageRank(this, 0.0) {
// Force jung to use non-normalized adjacency matrix
private val weightFunction = (l: Link) => 1.0.asInstanceOf[Number]
this.setEdgeWeights(weightFunction)
evaluate()
private val normalizer = gc.turtles.map((t) => getVertexScore(t).doubleValue()).sum
def getScore(turtle: Turtle): java.lang.Double = {
if (!graph.containsVertex(turtle))
throw new ExtensionException(turtle + " is not a member of the current graph context.")
getVertexScore(turtle).doubleValue / normalizer
}
}
For now, I'm just going to add tests for 0--1--2--3 that use the pagerank values.
Check that, Jung is wrong for 0--1--2--3--4. I'm starting to wonder if it would be best to drop Jung for whatever Gephi is using.
You're saying there's a bug in Jung? Is there a ticket for it in their tracker?
Regarding Jung's behavior with 0--1--2: I just implemented my own power iteration algorithm and I also get uniform values on that graph (though mine agrees with gephi otherwise).
@SethTisue, No there's not. It's not clear to me what Jung is trying to do, so I'm not sure it's a bug (well, except the 0--1--2 case, I'll definitely report that). It does, however, disagree with other reputable sources.
I couldn't find any relevant bug reports.
I was able to implement my own eigenvector centrality calculation pretty quick. In matches Gephi on every test I've run, and is almost identical implementation-wise. Still need to figure out what the most desirable answer is. Matching Gephi seems like a pretty good measure to me, since Gephi is so ubiquitous.
Alright, so I believe that Jung is actually computing PageRank. It's using power iteration to do it (which is the normal way of computing both PageRank and eigenvector centrality), but default power iteration fails for certain networks (which have no unique dominant eigenvalue). For example, any odd length line fails. Thus, 0--1--2 and 0--1--2--3--4 fail. Gephi gets around this by basically assuming implicit self-loops (at least in their eigenvector centrality algorithm; haven't looked at their PageRank yet). In my own experiments, this seems to work well, but does slow down convergence in certain cases.
Again, I think that matching Gephi is probably the way to go, but I need to discuss it with others.
Just looked at Gephi's PageRank. Their implementation of the random restart probability is a little more nuanced. Not only is there a probability of jumping to a random node in the network, essentially if the hypothetical random walker reaches a deadend, it immediately jumps. Thus, (with jump probability 0) on a network like 0->1
, 0
will get a PR of 1/3
and 1
a PR of 2/3
(remember that the walker can jump from 1
to 1
).
Their PageRank does actually fail on 0--1--2
; no self-loops are assumed, so convergence is never reached.
Jung gives [0.5 0.5]
for 0->1
. It also does the jumping out of deadends thing... I don't understand the discrepancy yet.
After discussing with Uri, he agreed that matching Gephi for eigenvector centrality was good, and that adding nw:page-rank
would be nice.
This is fixed by 433d13f1f8390ff6d695868c284398505190fcb6
Here's a simple language test for it (which I'm adding to the extensions language test). This is a just a line of three nodes, with
0 -- 1 -- 2
. Currently, the extension is giving[0.3333333333333333 0.3333333333333333 0.3333333333333333]
as the answer, which is obviously wrong as the middle node should definitely have a higher eigenvector centrality than the other two. The answer below comes from both my own calculation (by actually taking the largest eigenvalue's eigenvector from the adjacency matrix) and gephi (renormalized):