frankmcsherry / pagerank

Implementation of PageRank in timely dataflow
MIT License
73 stars 16 forks source link

How to save results in file? #4

Open LeMoussel opened 8 years ago

LeMoussel commented 8 years ago

Very interesting ... But How can I save in file eg CSV file (node, PR)?

de-code commented 7 years ago

That wasn't clear to me either. Where are the page rank values stored?

frankmcsherry commented 7 years ago

Oh wow; how did I miss this issue from 2016?

The pagerank values are stored in the src vector as the computation iterates. If you'd like them written out, the best way to do this with minimum modification is probably to use the same type of logic that prints out the average elapsed times (where we remove the index requirement):

if iter.inner == 20 && index == 0 { println!("average: {}", (time::precise_time_s() - going) / 10.0 ); }

At this moment, the system has reported that we have reached the 20th iteration (change to whatever you need), and are otherwise ready to stop (we stop because the feedback loop drops data with round > 20, not because the operator knows to shut down then).

But, this is a fine time to walk through src and print the results to .. the screen, or a file, or whatever you'd like to do with them.

Bear in mind that the element src[i] is (I believe) the pagerank of element index + i * peers, and each worker will report having an ith element. So, when recording them you'll want to use index and peers to disambiguate results.

There is also the detail, revealed by the next line

// prepare src for transmitting to destinations
for s in 0..src.len() { src[s] = (0.15 + 0.85 * src[s]) / deg[s] as f32; }

that the contents of src are (just before the code) accumulated updates that should next be blended with the reset distribution. We fuse the two steps (blend with reset, scale down by degree) but often the "pagerank" that people want to see is the result after the blending but before the scaling.

de-code commented 7 years ago

I am not sure what github response algorithm you switched to, but your response time has definitely increased dramatically!

I submitted a PR in case it's useful.

frankmcsherry commented 7 years ago

I saw it! The intent looks good; I'll try and take a read through later tonight or tomorrow, if that is ok.