UDST / choicemodels

Python library for discrete choice modeling
https://udst.github.io/choicemodels
BSD 3-Clause "New" or "Revised" License
74 stars 33 forks source link

[0.2.dev4] Choice simulation with capacity constraints #44

Closed smmaurer closed 5 years ago

smmaurer commented 5 years ago

This PR adds choice simulation with capacity constraints, resolving issue #26. (The prior PR, #43, covered choice simulation without capacity constraints, which this one builds on.)

Description

This PR adds a function called choicemodels.tools.iterative_lottery_choices() that performs Monte Carlo simulation of choices for a set of choice scenarios where (a) the alternatives have limited capacity and (b) the choosers have varying probability distributions over the alternatives.

Effectively, it simulates the choices sequentially, each time removing the chosen alternative or reducing its availably capacity. (It's actually done in batches for optimal performance, but the outcome is equivalent.) This requires sampling alternatives and calculating choice probabilities multiple times, so callables for those actions are required inputs.

Chooser priority is randomized. Capacities can be specified as counts (number of choosers that can be accommodated) or as amounts (e.g. square footage or employment capacity) with corresponding chooser sizes. If total capacity is insufficient to accommodate all the choosers, as many choices will be simulated as possible.

As with the other choice simulation tools, this is agnostic to the model and to the procedure for sampling alternatives. It will work with any multinomial choice model.

The functionality and algorithms are informed by urbansim's dcm.py, zone_model's lottery_choices, and MAG's constrained_choice utility. The choicemodels implementation is succinct, general, and handles all prior use cases except for MAG's flexible prioritization of choosers.

Usage

from choicemodels.tools import MergedChoiceTable, iterative_lottery_choices

obs, alts  # DataFrames of choosers and alternatives, with characteristics
fitted_model  # choicemodels.MultinomialLogit or other multinomial model

def sampling(obs, alts):
    return MergedChoiceTable(obs, alts, sample_size=10, <other params>)

def probs(mct)
    return fitted_model.probabilities(mct)

# Each alternative accommodates one chooser
choices = iterative_lottery_choices(obs, alts, sampling, probs)

# Alternatives accommodate N choosers
choices = iterative_lottery_choices(obs, alts, sampling, probs, alt_capacity='cap_col')

# Alternatives have size M
choices = iterative_lottery_choices(obs, alts, sampling, probs, alt_capacity='cap_col', 
                                    chooser_size='size_col')

Performance

Performance is quite good. The overhead from summing choices and comparing them to capacities is minimal. Simulation time scales with the number of choice collisions.

I ran tests with 1,000,000 choosers and 400,000 alternatives. Mean capacity of 3 choosers per alternative (20% oversupply), with utilities of the alternatives randomized. 10 alternatives per choice scenario, drawn with uniform weights. Execution times from an i7 iMac.

Simulating capacity-constrained choices took 5 iterations of this, with a beginning-to-end execution time of 8.0 seconds, or 60% longer than the unconstrained choices.

Some factors that increase the rate of choice collisions:

Limitations

  1. Simulation tools aren't optimized for "aggregate" choices.

    The ChoiceModels simulation utilities currently focus on "individual" choices, where choosers have varying probability distributions over the alternatives, because these require special algorithms for optimal performance.

    The same functions will work for "aggregate" simulation scenarios, where choosers share the same probability distribution over alternatives, but they aren't necessary: a simple np.random.choice() with size=K works fine and is more efficient.

    (It might be helpful to include functions for the "aggregate" choices just to provide consistent inputs/outputs, though. This can come in a future PR.)

  2. Performance with capacities + chooser sizes is not yet optimized.

    Currently, the algorithm only checks whether a chooser's size exceeds the capacity of an alternative after a choice has been made. This provides correct results, but can require a lot of iterations to match all the choosers. In some cases it would be more efficient to apply an availability filter before sampling alternatives for each chooser, but this isn't supported yet in the MergedChoiceTable utility. Coming soon!

    Chooser-level availability filters will also let users specify more complex simulation logic, like reserving large spaces for larger choosers.

Other changes

Versioning

coveralls commented 5 years ago

Coverage Status

Coverage increased (+8.3%) to 74.564% when pulling c9420c6244dfffeabdc7dab91207c08e743798af on choice-simulation into d110413876775b0708b15fba5dc02e0ffedb6430 on master.