RoheLab / aPPR

Approximate Personalized Page Rank
16 stars 3 forks source link

Notes to add to README when there is time #19

Open alexpghayes opened 2 years ago

alexpghayes commented 2 years ago

README beyond this point is really just scratch for myself

Sink nodes and unreachable nodes

citation_graph <- sample_pa(100)

citation_tracker <- appr(citation_graph, seeds = "5")

Why should I use aPPR?

aPPR calculates an approximation

comment on p = 0 versus p != 0

Advice on choosing epsilon

Number of unique visits as a function of epsilon, wait times, runtime proportion to 1 / (alpha * epsilon), etc, etc

speaking strictly in terms of the p != 0 nodes

1e-4 and 1e-5: finishes quickly, neighbors with high degree get visited 1e-6: visits most of 1-hop neighborhood. finishes in several hours for accounts who follow thousands of people with ~10 tokens. 1e-7: visits beyond the 1-hop neighbor by ???. takes a couple days to run with ~10 tokens. 1e-8: visits a lot beyond the 1-hop neighbor, presumably the important people in the 2-hop neighbor, ???

the most disparate a users interests, and the less connected their neighborhood, the longer it will take to run aPPR


Speed ideas

compute is not an issue relative to actually getting data

Compute time ~ access from Ram time << access from disk time << access from network time.

Make requests to API in bulk, memoize everything, cache / write to disk in a separate process?

General pattern: cache on disk, and also in RAM

Working with Tracker objects

See ?Tracker for details.