SEL-Columbia / networker

Network Planning Library
4 stars 4 forks source link

Testing Large (>100k) Datasets Fails #51

Closed carbz closed 9 years ago

carbz commented 9 years ago

While testing on modelrunner, networker is unable to process large (~100k node) datasets. Similar testing on medium sized (~10k) node datasets does not fail though.

Input test case here: http://23.253.225.19:8888/worker_data/41bda90e-6d66-4467-8888-94a82caab19e/input/

ModelRunner log/error message below

Running networkplanner reading input from /home/mr/modelrunner/worker_data/41bda90e-6d66-4467-8888-94a82caab19e/input discarding /home/mr/miniconda/envs/modelrunner/bin from PATH prepending /home/mr/miniconda/envs/networker/bin to PATH 2015-05-13 19:17:36,161 - networker - INFO - networker 0.1.0 (Python 2.7.9) ./scripts/networkplanner.sh: line 9: 18769 Killed run_networkplanner.py config.json -w $input_dir -o $output_dir

When running via command line, perhaps the same error occurs

carbz@RodimusPrime$ time scripts/run_networkplanner.py networkplannner_config_test-big.json -o /big-test/ Duplicate nodes ((2.714732526, 6.380952638)) {'voter_records': '1', 'y': '6.380952638', 'population': '495', 'name': '22961', 'x': '2.714732526'} ((2.714732526, 6.380952638)) {'voter_records': '1', 'y': '6.380952638', 'population': '332', 'name': '22962', 'x': '2.714732526'} run failed: Instance '<Node at 0x10a7aad10>' has been deleted, or its row is otherwise not present.

command line testing leads me to believe duplicate nodes are the problem. Thought I resolved that though and other ModelRunner tests seemed to handle duplicates okay so I'm not sure this is the real root cause. But it could be duplicates causing the error... This may be better solved by some sort of input validation step.

chrisnatali commented 9 years ago

Although networker's mod_boruvka algorithm is much less memory intensive than mod_kruskal, it still uses about 6G to compute the network for a 100k node dataset.

This was really an issue with the modelrunner deployment. I bumped up the worker memory to 16G to resolve it.