ActivitySim / populationsim

An Open Platform for Population Synthesis
https://activitysim.github.io/populationsim
Other
52 stars 37 forks source link

Randomness of PopulationSim outputs related to API calls #182

Open xiex0055 opened 8 months ago

xiex0055 commented 8 months ago

Recently, while executing MWCOG Population Synthesizer, developed based on PopulationSim v0.4.3, MWCOG staff discovered that the software consistently produced varying outputs at the disaggregate level (specifically, within the synthesized household and person files), despite identical inputs. This outcome was unexpected, considering that the entropy maximaizing and integerization methods utilized in PopulationSim should render deterministic results. In response, MWCOG staff sought assistance from their consultant, RSG. Initially, RSG staff suspected that the issue might be linked to multiprocessing (e.g., see https://github.com/ActivitySim/populationsim/issues/150). However, they subsequently realized that the multiprocessing feature was introduced in the latest version (v 0.5.1) of PopulationSim and was unavailable in the version (v0.4.3) used in the MWCOG Population Synthesizer. Aditya Gore (RSG) made the following observation:

_"The PopulationSim uses linear programming tools from the ortools package for integerization. Time limits can be specified for these tools and in PopulationSim the time limit is currently set to 60 seconds. I have noticed that based on available computer resources the tool sometimes (randomness here) hit this time limit without returning a solution in which case PopulationSim turns to a different method of integerization. It is possible that you are running into this issue and each run is producing slightly different results."_

MWCOG staff pose the following questions:

  1. Does the randomness of PopulationSim outputs significantly impact the PopulationSim results at the aggregate level including the validation results?
  2. Is there any evidence suggesting that this variability in PopulationSim outputs will NOT considerably affect regional travel demand modeling results?
  3. Can this source of random be eliminated in a future version of PopulationSim?
bettinardi commented 2 months ago

Quickly adding that Oregon's testing and work with PopulationSim is also producing this issue (it exists for Oregon too).

jeffreyhood commented 2 months ago

Although I'm not yet fully apprised of the issue, my initial reaction to this information is to propose that we change the limit from a time to a number of iterations, or that we change solvers if a limit based on a number of iterations is not possible in ortools. It will be very important for statewide collaboration that the population synthesizer generate reproducible results.