byzhang / graphchi

Automatically exported from code.google.com/p/graphchi
0 stars 0 forks source link

pagerank.cpp: would be nice to set the personalization vector #28

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
Looking at this code snippet from example_apps/pagerank.cpp

-- -- >8 -- -- >8 -- -- >8 -- -- >8 -- -- >8 -- -- >8 -- -- >8 
#define RANDOMRESETPROB 0.15

// ...

struct PagerankProgram :
  public GraphChiProgram<VertexDataType,
                         EdgeDataType> {

  // ...
  void update(graphchi_vertex<VertexDataType, EdgeDataType> &v,
              graphchi_context &ginfo) {
    float sum=0;
    // ...
    for(int i=0; i < v.num_inedges(); i++) {
      float val = v.inedge(i)->get_data();
      sum += val;                    
    }

    /* Compute my pagerank */
    float pagerank = RANDOMRESETPROB + (1 - RANDOMRESETPROB) * sum;
    // ...
  }
}
-- -- >8 -- -- >8 -- -- >8 -- -- >8 -- -- >8 -- -- >8 -- -- >8 

looks like that the current pagerank implementation doesn't
allow you to set the "personalisation vector" to anything different
than a uniform probability vector. I mean, if the pagerank equation
in matrix form is

p = (1-c) * A * p + c * V

where:

p is the pagerank vector, N compontents (N is the size of the web)
c is the probability of jumping to a random page no matter the outlinks
  from current location
A is the transition matrix, N-by-N, if you see a random walk on the web
  as a Markov chain
V is a N-vector, where V_i is the probability of random-jumping to page i

(side note: I am not normalizing by N, i.e. all probabilities sum up to N
and not to 1)

well, given all of this, in the current implementation of GraphChi pagerank
V is a uniform probability vectory = [1, 1, 1, ..., 1].
A jump to every page is equally likely to happen, no matter the page.

After this wall of text I come to my point:

could a non trivial "personalisation vector" be implemented?
I'd like to be able to set V myself.

Is this in the priorities of the GraphChi team?

Cheers,

Original issue reported on code.google.com by g.gherdo...@gmail.com on 12 Feb 2013 at 10:20

GoogleCodeExporter commented 9 years ago
It's marked as "defect", I should have set "enhancement"; I don't see how to 
change that.

Original comment by g.gherdo...@gmail.com on 12 Feb 2013 at 10:24

GoogleCodeExporter commented 9 years ago
If your personalization vector is sparse, it should be quite easy to keep it in 
memory and then you just modify the calculation accordingly: instead of 
RANDOMRESETPROB +   do   (personalization_vector[vertex.id()] + 
or something like that.

Original comment by akyrola...@gmail.com on 12 Feb 2013 at 4:46

GoogleCodeExporter commented 9 years ago
Thank you Aapo for the suggestion, I'll implement that.

--cheers,

Original comment by g.gherdo...@gmail.com on 12 Feb 2013 at 6:03