ActivitySim / activitysim

An Open Platform for Activity-Based Travel Modeling
https://activitysim.github.io
BSD 3-Clause "New" or "Revised" License
192 stars 99 forks source link

Tracing Memory Improvements with Sharrow #754

Open jpn-- opened 1 year ago

jpn-- commented 1 year ago

Is your feature request related to a problem? Please describe. When running production-scale ActivitySim simulations with Sharrow turned on, tracing consumes a lot of memory. This is because Sharrow is materializing very large intermediate arrays. For example, in a logit model when computing utility values, we compute $V = X \beta$. The array $X$ has a row for every observation and a column for every data element (i.e. every line in the SPEC file). When not tracing, the data in the $X$ array is assembled, consumed, and released dynamically by numba one row at a time, so that the memory to store all of $X$ is never needed. But for tracing, we need to write out to the trace file a (usually small) subset of the rows of $X$. Currently sharrow has no mechanism to save selected rows from the dynamically created values for $X$, so the only way to trace this data is create all of the rows, which temporarily uses a massive amount of memory.

@dhensle pointed out that tracing outside of a full-scale production run might not work when the effects of the full data are important (e.g. in shadow pricing).

Describe the solution you'd like Sharrow needs additional capabilities to (a) receive instructions about what trace, and (b) output an array of tracing values that can then be dumped into the tracing outputs.

Describe alternatives you've considered An alternative would be to implement tracing in an all-or-none mode, and selectively re-run only a subset of households through model components. This would probably be fine in most cases, but as noted above may be undesirable if there are interactions that depend on simulating at scale.

i-am-sijia commented 10 months ago

Adding more context ...

In Phase 8 data type optimization work, we closely traced the memory usage of the example ARC model, and reported at the 9/26/2023 project meeting that turning household tracing on with Sharrow created additional memory spikes (for the reason Jeff described above), and also additional run time. Below are memory profiling charts showing the difference in memory requirement just by turning household tracing on vs off.

image