Eigenvector centrality wrong

qiemem commented 10 years ago

Here's a simple language test for it (which I'm adding to the extensions language test). This is a just a line of three nodes, with 0 -- 1 -- 2. Currently, the extension is giving [0.3333333333333333 0.3333333333333333 0.3333333333333333] as the answer, which is obviously wrong as the middle node should definitely have a higher eigenvector centrality than the other two. The answer below comes from both my own calculation (by actually taking the largest eigenvalue's eigenvector from the adjacency matrix) and gephi (renormalized):

eigenvector-centrality-line
  extensions [ nw ]
  O> crt 3
  O> ask turtle 0 [ create-link-with turtle 1 ]
  O> ask turtle 1 [ create-link-with turtle 2 ]
  map [ [ nw:eigenvector-centrality ] of ? ] sort turtles => [0.29289322 0.41421356 0.29289322]

qiemem commented 10 years ago

I've confirmed that this has been broken since the snapshot days.

qiemem commented 10 years ago

Okay, this is a little complicated. Jung is just doing PageRank with alpha = 0 for its eigenvector centrality measure. PageRank normalizes the columns of the adjacency matrix so that they sum to 1. Eigenvector centrality (I don't believe) doesn't. In Gephi, eigenvector centrality really is calculated by just taking the most significant eigenvector of the adjacency, whereas its pagerank does the normalization first.

Jung is definitely wrong for the network 0--1--2. For 0--1--2--3, it gets the same as Gephi's PageRank, with p=1 (i.e. alpha = 0). So its not wrong... its just different than Gephi.

I can feed it edge weight values that force it to use a non-normalized adjacency matrix. It agrees with Gephi if I do this (up to scaling). The only problem is that the scaling is huge on it. 0--1--2--3 gives [1.432869610034543E20 2.318431730482698E20 2.318431730482698E20 1.432869610034543E20], so I'm definitely worried about overflow for large networks.

Here's code that almost does it, and renormalizes the results:

  object EigenvectorCentrality extends jungalg.scoring.PageRank(this, 0.0) {
    // Force jung to use non-normalized adjacency matrix
    private val weightFunction = (l: Link) => 1.0.asInstanceOf[Number]
    this.setEdgeWeights(weightFunction)
    evaluate()
    private val normalizer = gc.turtles.map((t) => getVertexScore(t).doubleValue()).sum
    def getScore(turtle: Turtle): java.lang.Double = {
      if (!graph.containsVertex(turtle))
        throw new ExtensionException(turtle + " is not a member of the current graph context.")
      getVertexScore(turtle).doubleValue / normalizer
    }
  }

qiemem commented 10 years ago

For now, I'm just going to add tests for 0--1--2--3 that use the pagerank values.

qiemem commented 10 years ago

Check that, Jung is wrong for 0--1--2--3--4. I'm starting to wonder if it would be best to drop Jung for whatever Gephi is using.

SethTisue commented 10 years ago

You're saying there's a bug in Jung? Is there a ticket for it in their tracker?

qiemem commented 10 years ago

Regarding Jung's behavior with 0--1--2: I just implemented my own power iteration algorithm and I also get uniform values on that graph (though mine agrees with gephi otherwise).

qiemem commented 10 years ago

@SethTisue, No there's not. It's not clear to me what Jung is trying to do, so I'm not sure it's a bug (well, except the 0--1--2 case, I'll definitely report that). It does, however, disagree with other reputable sources.

I couldn't find any relevant bug reports.

qiemem commented 10 years ago

I was able to implement my own eigenvector centrality calculation pretty quick. In matches Gephi on every test I've run, and is almost identical implementation-wise. Still need to figure out what the most desirable answer is. Matching Gephi seems like a pretty good measure to me, since Gephi is so ubiquitous.

qiemem commented 10 years ago

Alright, so I believe that Jung is actually computing PageRank. It's using power iteration to do it (which is the normal way of computing both PageRank and eigenvector centrality), but default power iteration fails for certain networks (which have no unique dominant eigenvalue). For example, any odd length line fails. Thus, 0--1--2 and 0--1--2--3--4 fail. Gephi gets around this by basically assuming implicit self-loops (at least in their eigenvector centrality algorithm; haven't looked at their PageRank yet). In my own experiments, this seems to work well, but does slow down convergence in certain cases.

Again, I think that matching Gephi is probably the way to go, but I need to discuss it with others.

qiemem commented 10 years ago

Just looked at Gephi's PageRank. Their implementation of the random restart probability is a little more nuanced. Not only is there a probability of jumping to a random node in the network, essentially if the hypothetical random walker reaches a deadend, it immediately jumps. Thus, (with jump probability 0) on a network like 0->1, 0 will get a PR of 1/3 and 1 a PR of 2/3 (remember that the walker can jump from 1 to 1).

Their PageRank does actually fail on 0--1--2; no self-loops are assumed, so convergence is never reached.

Jung gives [0.5 0.5] for 0->1. It also does the jumping out of deadends thing... I don't understand the discrepancy yet.

qiemem commented 10 years ago

After discussing with Uri, he agreed that matching Gephi for eigenvector centrality was good, and that adding nw:page-rank would be nice.

qiemem commented 10 years ago

This is fixed by 433d13f1f8390ff6d695868c284398505190fcb6

NetLogo / NW-Extension

Eigenvector centrality wrong #115