ActivitySim / populationsim

An Open Platform for Population Synthesis
https://activitysim.github.io/populationsim
Other
53 stars 41 forks source link

PopulationSim Dependency Update #186

Open dhensle opened 1 month ago

dhensle commented 1 month ago

The Problem PopulationSim currently relies on ActivitySim’s pipeline framework for data I/O and multiprocessing. As a result, they share many of the same core dependencies (e.g., pandas and numpy). However, as ActivitySim development has progressed, it has become increasingly incompatible with the current version of PopulationSim. As an interim solution, PopulationSim’s dependencies can be frozen or locked to only use a specific version of ActivitySim and related dependencies. However, this limits PopulationSim to an older and increasingly more out of date version of Python and requires install multiple versions of python if they wish to use both PopulationSim and ActivitySim.

Proposed Solution -- Remove the ActivitySim dependency from PopulationSim ActivitySim has recently overhauled their underlying pipeline management system. Instead of trying to update the code to the newest version of ActivitySim's pipeline, it would be likely easier and more stable long term to decouple ActivitySim's internal pipeline from PopulationSim. Replacing it would be just the necessary pipeline components PopulationSim requires (much less than ActivitySim's requirements). We could take the code we need from the ActivitySim repo if required. The expected functionality would center primarily around data I/O and multiprocessing.

Updating Other Dependencies Once the ActivitySim dependency has been decoupled, it will be much easier to update the remaining important dependencies (e.g. python, pandas, numpy, etc.) that PopulationSim has by not having to worry about whether that dependency has conflicts with ActivitySim.

bettinardi commented 1 month ago

I fully support updating the Dependencies. I just want to add that when Oregon originally scoped the development of PopSim, the intent was that PopSim would operate in the ActivitySim code environment. That the functions that PopSim uses would work in the ActivitySim install environment. If the ActivitySim partners accepted PopSim, that PopSim functions would be included within the ActivitySim package. We are currently in a state of two separate repos with different installs and different dependencies. My ideal solution would be to move PopSim python functions into the ActivitySim functions and ensure that they work with ActivitySim dependencies, and the PopSim repo would then disappear and when a user installs ActivitySim they would have access to all ActivitySim and PopSim functionality.

bwentl commented 1 month ago

I'd like to second Alex's pain points, and provide information on TransLink's workflow and concerns with keeping PopulationSim as a separate package, especially considering that population synthesis is a important step prior to running ActivitySim.

Our current approach to preparing synthetic population is to prepackage a set of land use and synthetic population for each base year scenario. This workflow is reasonable for base year scenarios as we wouldn't perform updates very frequently for base years, but we can foresee a potential issue when our activity-based model is used for planning studies, where modelers and planners in our region will frequently update land use projections based on their project requirement. We fear that the current workflow is too complicated with two separate packages (and potential separate environment set up), which can result in user errors. Modelers may forget to generate new repopulated synthetic population for their updated scenario, due to the process being complicated or they may even assume the model can catch the changes in land_use.csv and update the synthetic population automatically.

It would be great if we can bring in population synthesis as a step / model in the ActivitySim pipeline. We can let agency modelers to adjust the actual settings for new synthetic population or updating synthetic population based on land use changes or checks. This way, the population totals expressed in land_use.csv will always be consistent with the synthetic population.