Closed carbz closed 9 years ago
Although networker's mod_boruvka algorithm is much less memory intensive than mod_kruskal, it still uses about 6G to compute the network for a 100k node dataset.
This was really an issue with the modelrunner deployment. I bumped up the worker memory to 16G to resolve it.
While testing on
modelrunner
, networker is unable to process large (~100k node) datasets. Similar testing on medium sized (~10k) node datasets does not fail though.Input test case here: http://23.253.225.19:8888/worker_data/41bda90e-6d66-4467-8888-94a82caab19e/input/
ModelRunner log/error message below
When running via command line, perhaps the same error occurs
command line testing leads me to believe duplicate nodes are the problem. Thought I resolved that though and other ModelRunner tests seemed to handle duplicates okay so I'm not sure this is the real root cause. But it could be duplicates causing the error... This may be better solved by some sort of input validation step.