This PR adds choice simulation with capacity constraints, resolving issue #26. (The prior PR, #43, covered choice simulation without capacity constraints, which this one builds on.)
Description
This PR adds a function called choicemodels.tools.iterative_lottery_choices() that performs Monte Carlo simulation of choices for a set of choice scenarios where (a) the alternatives have limited capacity and (b) the choosers have varying probability distributions over the alternatives.
Effectively, it simulates the choices sequentially, each time removing the chosen alternative or reducing its availably capacity. (It's actually done in batches for optimal performance, but the outcome is equivalent.) This requires sampling alternatives and calculating choice probabilities multiple times, so callables for those actions are required inputs.
Chooser priority is randomized. Capacities can be specified as counts (number of choosers that can be accommodated) or as amounts (e.g. square footage or employment capacity) with corresponding chooser sizes. If total capacity is insufficient to accommodate all the choosers, as many choices will be simulated as possible.
As with the other choice simulation tools, this is agnostic to the model and to the procedure for sampling alternatives. It will work with any multinomial choice model.
The functionality and algorithms are informed by urbansim's dcm.py, zone_model's lottery_choices, and MAG's constrained_choice utility. The choicemodels implementation is succinct, general, and handles all prior use cases except for MAG's flexible prioritization of choosers.
Usage
from choicemodels.tools import MergedChoiceTable, iterative_lottery_choices
obs, alts # DataFrames of choosers and alternatives, with characteristics
fitted_model # choicemodels.MultinomialLogit or other multinomial model
def sampling(obs, alts):
return MergedChoiceTable(obs, alts, sample_size=10, <other params>)
def probs(mct)
return fitted_model.probabilities(mct)
# Each alternative accommodates one chooser
choices = iterative_lottery_choices(obs, alts, sampling, probs)
# Alternatives accommodate N choosers
choices = iterative_lottery_choices(obs, alts, sampling, probs, alt_capacity='cap_col')
# Alternatives have size M
choices = iterative_lottery_choices(obs, alts, sampling, probs, alt_capacity='cap_col',
chooser_size='size_col')
Performance
Performance is quite good. The overhead from summing choices and comparing them to capacities is minimal. Simulation time scales with the number of choice collisions.
I ran tests with 1,000,000 choosers and 400,000 alternatives. Mean capacity of 3 choosers per alternative (20% oversupply), with utilities of the alternatives randomized. 10 alternatives per choice scenario, drawn with uniform weights. Execution times from an i7 iMac.
4.0 seconds to build the first choice table (drawing alternatives for all the choosers)
0.6 seconds to calculate choice probabilities (scales with the complexity of the model expression)
0.4 seconds to simulate unconstrained choices
Simulating capacity-constrained choices took 5 iterations of this, with a beginning-to-end execution time of 8.0 seconds, or 60% longer than the unconstrained choices.
Some factors that increase the rate of choice collisions:
low ratios of capacity to choosers
correlated utility of alternatives
mismatches between chooser size and alternative capacities (see below)
Limitations
Simulation tools aren't optimized for "aggregate" choices.
The ChoiceModels simulation utilities currently focus on "individual" choices, where choosers have varying probability distributions over the alternatives, because these require special algorithms for optimal performance.
The same functions will work for "aggregate" simulation scenarios, where choosers share the same probability distribution over alternatives, but they aren't necessary: a simple np.random.choice() with size=K works fine and is more efficient.
(It might be helpful to include functions for the "aggregate" choices just to provide consistent inputs/outputs, though. This can come in a future PR.)
Performance with capacities + chooser sizes is not yet optimized.
Currently, the algorithm only checks whether a chooser's size exceeds the capacity of an alternative after a choice has been made. This provides correct results, but can require a lot of iterations to match all the choosers. In some cases it would be more efficient to apply an availability filter before sampling alternatives for each chooser, but this isn't supported yet in the MergedChoiceTable utility. Coming soon!
Chooser-level availability filters will also let users specify more complex simulation logic, like reserving large spaces for larger choosers.
Other changes
extensive choice simulation tests
in MergedChoiceTable, empty choosers or alternatives now produces an empty choice table (rather than an exception), to align with the anticipated behavior when availability filters yield no valid alternatives
Coverage increased (+8.3%) to 74.564% when pulling c9420c6244dfffeabdc7dab91207c08e743798af on choice-simulation into d110413876775b0708b15fba5dc02e0ffedb6430 on master.
This PR adds choice simulation with capacity constraints, resolving issue #26. (The prior PR, #43, covered choice simulation without capacity constraints, which this one builds on.)
Description
This PR adds a function called
choicemodels.tools.iterative_lottery_choices()
that performs Monte Carlo simulation of choices for a set of choice scenarios where (a) the alternatives have limited capacity and (b) the choosers have varying probability distributions over the alternatives.Effectively, it simulates the choices sequentially, each time removing the chosen alternative or reducing its availably capacity. (It's actually done in batches for optimal performance, but the outcome is equivalent.) This requires sampling alternatives and calculating choice probabilities multiple times, so callables for those actions are required inputs.
Chooser priority is randomized. Capacities can be specified as counts (number of choosers that can be accommodated) or as amounts (e.g. square footage or employment capacity) with corresponding chooser sizes. If total capacity is insufficient to accommodate all the choosers, as many choices will be simulated as possible.
As with the other choice simulation tools, this is agnostic to the model and to the procedure for sampling alternatives. It will work with any multinomial choice model.
The functionality and algorithms are informed by urbansim's dcm.py, zone_model's lottery_choices, and MAG's constrained_choice utility. The choicemodels implementation is succinct, general, and handles all prior use cases except for MAG's flexible prioritization of choosers.
Usage
Performance
Performance is quite good. The overhead from summing choices and comparing them to capacities is minimal. Simulation time scales with the number of choice collisions.
I ran tests with 1,000,000 choosers and 400,000 alternatives. Mean capacity of 3 choosers per alternative (20% oversupply), with utilities of the alternatives randomized. 10 alternatives per choice scenario, drawn with uniform weights. Execution times from an i7 iMac.
Simulating capacity-constrained choices took 5 iterations of this, with a beginning-to-end execution time of 8.0 seconds, or 60% longer than the unconstrained choices.
Some factors that increase the rate of choice collisions:
Limitations
Simulation tools aren't optimized for "aggregate" choices.
The ChoiceModels simulation utilities currently focus on "individual" choices, where choosers have varying probability distributions over the alternatives, because these require special algorithms for optimal performance.
The same functions will work for "aggregate" simulation scenarios, where choosers share the same probability distribution over alternatives, but they aren't necessary: a simple
np.random.choice()
withsize=K
works fine and is more efficient.(It might be helpful to include functions for the "aggregate" choices just to provide consistent inputs/outputs, though. This can come in a future PR.)
Performance with capacities + chooser sizes is not yet optimized.
Currently, the algorithm only checks whether a chooser's size exceeds the capacity of an alternative after a choice has been made. This provides correct results, but can require a lot of iterations to match all the choosers. In some cases it would be more efficient to apply an availability filter before sampling alternatives for each chooser, but this isn't supported yet in the MergedChoiceTable utility. Coming soon!
Chooser-level availability filters will also let users specify more complex simulation logic, like reserving large spaces for larger choosers.
Other changes
MergedChoiceTable
, empty choosers or alternatives now produces an empty choice table (rather than an exception), to align with the anticipated behavior when availability filters yield no valid alternativesVersioning