Large datasets - Githubissues

See the big-data branch for upper bounding on data sets and betweenness centrality estimation. These bounds were determined by running a series of tests on a variety of random graphs to roughly find the sweet spot in the accuracy/runtime tradeoff. Ultimately I was able to get the runtime of the betweenness_centrality function under 90 sec with error less than two decimal places (< .01).

This estimation works in the backend logic/testing suite, but unfortunately still times out in production. I was able to get it working by further lowering the upper bounds: MAX_NODES = 2000 and MAX_V_X_E (nodes * edges) = 10000000. At that point, however, the accuracy of the betweenness centrality estimation is much lower. I can't really invest any more time into fine tuning this but it may be worth revisiting in the future. One possible solution would be to run the betweenness centrality algorithm in parallel, as in this example.

dataculturegroup / DataBasic

Large datasets #284