SEL-Columbia / sequencer

Python library for sequencing the output of Network Planner csv's and shape file outputs
Other
4 stars 4 forks source link

Running out of memory #50

Closed chrisnatali closed 8 years ago

chrisnatali commented 8 years ago

On runs with large data set

Error:
reading input from /home/mr/modelrunner/worker_data/aa2d1586-0916-4a36-878d-e3098e2cb2ee/input discarding /home/mr/miniconda/envs/modelrunner/bin from PATH prepending /home/mr/miniconda/envs/sequencer/bin to PATH 2016-04-06 16:40:22,576 - sequencer - INFO - sequencer 0.0.5 (Python 2.7.11) 2016-04-06 16:40:22,578 - sequencer - INFO - Asserting Input Projections Match 2016-04-06 16:40:22,584 - sequencer - INFO - Aligning Network Nodes With Input Metrics /home/mr/miniconda/envs/sequencer/lib/python2.7/site-packages/sequencer/Utils.py:52: FutureWarning: sort(....) is deprecated, use sort_index(.....) metrics = pd.merge(metrics, node_df, on='m_coords', left_index=True).sort() /home/mr/miniconda/envs/sequencer/lib/python2.7/site-packages/sequencer/Utils.py:61: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy fake_nodes['m_coords'] = fake_nodes['m_coords'].apply(lambda x: ()) /home/mr/miniconda/envs/sequencer/lib/python2.7/site-packages/sequencer/Utils.py:64: FutureWarning: sort(....) is deprecated, use sort_index(.....) metrics = pd.concat([closest_match, fake_nodes]).sort() 2016-04-06 16:42:27,441 - sequencer - INFO - Computing Pairwise Distances 2016-04-06 16:42:27,442 - sequencer - INFO - Using haversine Distance Traceback (most recent call last): File "/home/mr/modelrunner/models/mvmax_sequencer.py", line 33, in nwp = NetworkPlan(shp_file, csv_file, prioritize='Population') File "/home/mr/miniconda/envs/sequencer/lib/python2.7/site-packages/sequencer/NetworkPlan.py", line 49, in init self.distance_matrix = self._distance_matrix() File "/home/mr/miniconda/envs/sequencer/lib/python2.7/site-packages/sequencer/NetworkPlan.py", line 88, in _distance_matrix return np.vstack(map(haversine, coords)) File "/home/mr/miniconda/envs/sequencer/lib/python2.7/site-packages/numpy/core/shape_base.py", line 228, in vstack return _nx.concatenate([atleast_2d(_m) for _m in tup], 0) MemoryError

chrisnatali commented 8 years ago

@vr2262 any interest?

vr2262 commented 8 years ago

Oh boy...

I can take a look but memory problems are painful.

chrisnatali commented 8 years ago

Environ setup may be painful, but if you've used numpy (or want to) it may be a matter of finding an alt to vstack. At least want to understand input size to memory usage function (i.e. is it n**2?). I'll prob look at tmrw too.

chrisnatali commented 8 years ago

The simplest way to setup is to clone this repo and set up via conda see these instructions

To reproduce/troubleshoot the issue:

Data:

Run: Either follow the instructions here to test within a python repl -or- Copy this command line script into your environment and run the model via something similar to:

python mvmax_sequencer.py -i input -o output

The input dir should contain the metrics-local.csv (the settlement points) and networks-proposed.* (the proposed network). Output dir will contain the sequenced grid.