UDST / choicemodels

Python library for discrete choice modeling
https://udst.github.io/choicemodels
BSD 3-Clause "New" or "Revised" License
74 stars 33 forks source link

[0.2.dev2] MNL probabilities #41

Closed smmaurer closed 6 years ago

smmaurer commented 6 years ago

This PR adds functionality to the MultinomialLogitResults() class for generating predicted probabilities. The code is fully refactored and no longer relies on urbanism.urbanchoice.

This is one component of issue #26. It's also a requirement for urbansim_templates issue #31.

Usage

from choicemodels import MultinomialLogit
from choicemodels.tools import MergedChoiceTable

data = MergedChoiceTable(observations, alternatives, 'choice_col', sample_size)

model = MultinomialLogit(data, 'model_expression')
results = model.fit()

probs = results.probabilities(data)  # returns pd.Series with index matching input data

Discussion

I made this a method of the results object because each model class will probably need special logic to generate predicted probabilities. After we have probabilities, the choice simulation can be model agnostic.

The MultinomialLogitResults() object is easy to regenerate, for template purposes or for users who want to do prediction with a model they estimated previously:

from choicemodels import MultinomialLogitResults

results = MultinomialLogitResults(model_expression, fitted_parameters)
probs = results.probabilities(data)

Refactoring

I refactored the underlying code because urbansim.urbanchoice.mnl didn't have a clear code path for this kind of use case. The logic is mostly drawn from mnl_probs(), with some pre- and post-processing from mnl_simulate().

I removed the GPU acceleration option because it makes the code harder to work with and uses a library that isn't in active development any more. We may want to add this back in the future, or look into some other kind of acceleration.

Performance is good: 1.4 seconds to generate 10 million probabilities on a slow i5 MacBook, using either the old or new codebase.

This PR includes a unit test confirming that the predicted probabilities are identical in both codebases.

Other changes

Versioning

coveralls commented 6 years ago

Coverage Status

Coverage increased (+6.7%) to 65.877% when pulling 5664cb51b812f23d1c3c0403dd8a2831c40ab4da on sampling-weights into c673198fc9972ee3cdd7fd1dce2454b06c0b2cc6 on master.

Eh2406 commented 6 years ago

https://cupy.chainer.org/ may be a more maintained GPU library.

smmaurer commented 6 years ago

@Eh2406 Cool, thank you!