Tracing Memory Improvements with Sharrow

Is your feature request related to a problem? Please describe. When running production-scale ActivitySim simulations with Sharrow turned on, tracing consumes a lot of memory. This is because Sharrow is materializing very large intermediate arrays. For example, in a logit model when computing utility values, we compute $V = X \beta$. The array $X$ has a row for every observation and a column for every data element (i.e. every line in the SPEC file). When not tracing, the data in the $X$ array is assembled, consumed, and released dynamically by numba one row at a time, so that the memory to store all of $X$ is never needed. But for tracing, we need to write out to the trace file a (usually small) subset of the rows of $X$. Currently sharrow has no mechanism to save selected rows from the dynamically created values for $X$, so the only way to trace this data is create all of the rows, which temporarily uses a massive amount of memory.

@dhensle pointed out that tracing outside of a full-scale production run might not work when the effects of the full data are important (e.g. in shadow pricing).

Describe the solution you'd like Sharrow needs additional capabilities to (a) receive instructions about what trace, and (b) output an array of tracing values that can then be dumped into the tracing outputs.

Describe alternatives you've considered An alternative would be to implement tracing in an all-or-none mode, and selectively re-run only a subset of households through model components. This would probably be fine in most cases, but as noted above may be undesirable if there are interactions that depend on simulating at scale.

ActivitySim / activitysim

Tracing Memory Improvements with Sharrow #754