ActivitySim / activitysim

An Open Platform for Activity-Based Travel Modeling
https://activitysim.github.io
BSD 3-Clause "New" or "Revised" License
189 stars 96 forks source link

Inefficient Chunking for Memory Management #733

Open joecastiglione opened 9 months ago

joecastiglione commented 9 months ago

ActivitySim allows model owners to implement arbitrarily large choice model problems. Meaning, ActivitySim does not limit the number of agents that can be simulated nor the number of choices any single model component presents to these choosers. ActivitySim’s architecture solves these problems through a memory-intensive framework, built on top of Python’s pandas library. When very large problems are presented to computing environments with insufficient memory, the problem must be broken down into smaller parts — referred to as “chunking” in ActivitySim.

Chunking can be done efficiently, by running the model and trying to allocate the maximum number of chunks that fit in available memory. This process, while theoretically ideal, has proved frustrating to users. Version 1.4 introduces the ability to chunk manually, with the user determining the number of chunks for each model step. This allows users to avoid the chunk training step and size the problems to their hardware themselves; users can choose somewhat smaller chunk sizes to run more reliably across a variety of hardware configurations at some cost in runtime efficiency, or carefully tune the chunk sizes to a particular platform, to get fast runtimes, but open up some risk of failure due to memory errors. See THIS EXAMPLE for how to chunk inefficiently.

guyrousseau commented 9 months ago

Chunking is an issue. Running ActivitySim without chunking is ideal. If you have enough memory to load everything into memory without chunking, this might be the fastest way to run ActivitySim. For the Atlanta ActivitySim implementation at ARC, once you get through work and school location, you have a pretty good sense as to whether this is going to work, typically. Train chunking is another option, where you run the model once in training mode. After the model runs once in training mode, then you run it in production mode, but with adjustments in the settings.yaml