UDST / choicemodels

Python library for discrete choice modeling
https://udst.github.io/choicemodels
BSD 3-Clause "New" or "Revised" License
74 stars 33 forks source link

[0.2.dev1] Better sampling support for MergedChoiceTable utility #37

Closed smmaurer closed 6 years ago

smmaurer commented 6 years ago

This PR adds substantial functionality to the MergedChoiceTable utility.

It's related to Issues #4, #5, and #11, and to UDST/urbansim_templates MNL support.

Features and usage

MergedChoiceTable now supports:

All this should work automatically in MNL models. Note that with non-random sampling of alternatives and small sample sizes, estimated coefficients can be biased unless a correction term is added (see issue #38).

The intention of this PR is to provide general-purpose functionality that can serve as a back end for more specialized tools that automate distance-based sampling, bands, buckets, etc.

I've also done groundwork for the following features that will come later:

Implementation

This required deep enough surgery that the easiest approach was to start fresh rather than drawing on existing code in urbansim.urbanchoice (which did not support weights, availability, non-replacement, or non-sampling use cases).

I've done some basic optimization for things like choosing the most efficient underlying sampling library for each use case (mostly NumPy but sometimes core Python) and drawing single rather than repeated samples whenever possible.

Issue #39 discusses the current performance of the code, and optimizations we might want to look into.

Other changes

Versioning

coveralls commented 6 years ago

Coverage Status

Coverage increased (+6.6%) to 59.194% when pulling 0b8a2b96a1aca675972eba1502e37dca4b19cbdb on sampling-weights into b3cb2b9496a5c3d11b9a875ec6d4c85246b2b5a8 on master.