ActivitySim / activitysim

An Open Platform for Activity-Based Travel Modeling
https://activitysim.github.io
BSD 3-Clause "New" or "Revised" License
191 stars 99 forks source link

compute_accessibility chunk size #396

Closed esanchez01 closed 3 years ago

esanchez01 commented 3 years ago

I have been working towards optimizing the performance of SANDAG's ActivitySim 3-zone set up (using the latest develop branch). In terms of run time, the main bottleneck has proven to be the compute_accessibility step -- this appears to be due to the step not fitting in RAM on our machine (320GB). I therefore tested smaller chunk sizes to alleviate this issue and have seen reductions in run time. However, I have not been able to fully fit the step in RAM. Problem being that the model run fails after setting too low of a chunk size. In our case, setting a chunk size smaller than 6 billion results in failure (completed at 6 but failed at 5.5).

Looking at the stack trace (included below), it appears that the transit_df for one of the processes is empty and therefore can't be vectorized (there is another error reported for block_offsets but this seems to be a bug in the code -- block_offsets is not guaranteed to be available in the except clause).

It's my understanding that the chunk size is dynamic but still requires an estimate. I'm not certain whether this failure may be due to the chunk size estimates being too small (below 6 billion) or for some other code related reason.

Traceback 21/03/2021 03:47:52 - DEBUG - activitysim.core.chunk - log_df transit_df elements: 0 bytes: 0.0 shape: (0, 4) : accessibility.tvpb_best_time.AM.build_virtual_path.compute_tap_tap_time 21/03/2021 03:47:52 - INFO - activitysim.core.mem - trace_memory_info accessibility.tvpb_best_time.AM.build_virtual_path.compute_tap_tap_time.add.transit_df rss: 0.93GB used: 93.46 GB percent: 29.2% 21/03/2021 03:47:52 - DEBUG - activitysim.core.pathbuilder_cache - MEM #TVPB CACHE compute_tap_tap_utilities all_transit_paths net 0 B (0) total 956 MB in 2.25 s 21/03/2021 03:47:52 - INFO - activitysim.core.pathbuilder - #TVPB CACHE deduped transit_df from 0 to 0 21/03/2021 03:47:52 - ERROR - activitysim.core.skim_dictionary - SkimDict lookup_3d error: ValueError: cannot call `vectorize` on size 0 inputs unless `otypes` is set 21/03/2021 03:47:52 - ERROR - activitysim.core.skim_dictionary - key TRN_IVT_FAST 21/03/2021 03:47:52 - ERROR - activitysim.core.skim_dictionary - orig max nan min nan 21/03/2021 03:47:52 - ERROR - activitysim.core.skim_dictionary - dest max nan min nan 21/03/2021 03:47:52 - ERROR - activitysim.core.skim_dictionary - skim_keys_to_indexes: {'AM': 603, 'MD': 604, 'PM': 605} 21/03/2021 03:47:52 - ERROR - activitysim.core.skim_dictionary - dim3 [] 21/03/2021 03:47:52 - ERROR - activitysim.core.assign - assign_variables - UnboundLocalError (local variable 'block_offsets' referenced before assignment) evaluating: los.get_tappairs3d(df.btap, df.atap, df.tod, 'TRN_IVT_FAST') Traceback (most recent call last): File "c:\users\esanc\.conda\envs\asimtest_multi_develop\lib\site-packages\activitysim\core\skim_dictionary.py", line 329, in lookup_3d block_offsets = np.vectorize(skim_keys_to_indexes.get)(dim3) # this should be faster than map File "c:\users\esanc\.conda\envs\asimtest_multi_develop\lib\site-packages\numpy\lib\function_base.py", line 2108, in __call__ return self._vectorize_call(func=func, args=vargs) File "c:\users\esanc\.conda\envs\asimtest_multi_develop\lib\site-packages\numpy\lib\function_base.py", line 2186, in _vectorize_call ufunc, otypes = self._get_ufunc_and_otypes(func=func, args=args) File "c:\users\esanc\.conda\envs\asimtest_multi_develop\lib\site-packages\numpy\lib\function_base.py", line 2142, in _get_ufunc_and_otypes raise ValueError('cannot call `vectorize` on size 0 inputs ' ValueError: cannot call `vectorize` on size 0 inputs unless `otypes` is set During handling of the above exception, another exception occurred: Traceback (most recent call last): File "c:\users\esanc\.conda\envs\asimtest_multi_develop\lib\site-packages\activitysim\core\assign.py", line 287, in assign_variables expr_values = to_series(eval(expression, globals_dict, _locals_dict)) File "", line 1, in File "c:\users\esanc\.conda\envs\asimtest_multi_develop\lib\site-packages\activitysim\core\los.py", line 568, in get_tappairs3d s = self.get_skim_dict('tap').lookup_3d(otap, dtap, dim3, key) File "c:\users\esanc\.conda\envs\asimtest_multi_develop\lib\site-packages\activitysim\core\skim_dictionary.py", line 338, in lookup_3d logger.error(f"dim3 block_offsets {np.unique(block_offsets)}") UnboundLocalError: local variable 'block_offsets' referenced before assignment 21/03/2021 03:47:52 - DEBUG - activitysim.core.pathbuilder_cache - MEM #TVPB build_virtual_path compute_tap_tap net 624 KB (638976) total 956 MB in 2.62 s 21/03/2021 03:47:52 - ERROR - activitysim.core.assign - assign_variables - UnboundLocalError (local variable 'block_offsets' referenced before assignment) evaluating: tvpb.get_tvpb_best_transit_time(orig=df.orig, dest=df.dest, tod='AM') Traceback (most recent call last): File "c:\users\esanc\.conda\envs\asimtest_multi_develop\lib\site-packages\activitysim\core\skim_dictionary.py", line 329, in lookup_3d block_offsets = np.vectorize(skim_keys_to_indexes.get)(dim3) # this should be faster than map File "c:\users\esanc\.conda\envs\asimtest_multi_develop\lib\site-packages\numpy\lib\function_base.py", line 2108, in __call__ return self._vectorize_call(func=func, args=vargs) File "c:\users\esanc\.conda\envs\asimtest_multi_develop\lib\site-packages\numpy\lib\function_base.py", line 2186, in _vectorize_call ufunc, otypes = self._get_ufunc_and_otypes(func=func, args=args) File "c:\users\esanc\.conda\envs\asimtest_multi_develop\lib\site-packages\numpy\lib\function_base.py", line 2142, in _get_ufunc_and_otypes raise ValueError('cannot call `vectorize` on size 0 inputs ' ValueError: cannot call `vectorize` on size 0 inputs unless `otypes` is set
bstabler commented 3 years ago

397

bstabler commented 3 years ago

This crash should not happen so we'll need to investigate. In the meantime, you can set different chunk sizes for different submodels as shown here

image