debrouwere / posteriori

Making probabilistic calculations as straightforward as `2 + 2 = 4`
0 stars 0 forks source link

Bring Your Own Distribution #7

Open debrouwere opened 8 years ago

debrouwere commented 8 years ago

The idea of posteriori is basically advanced napkin math, so you don't want to get bogged down with false precision. That said,

  1. maybe you have data
  2. maybe you have fit some sort of model, either a Bayesian model in PyMC or a frequentist model in statsmodels or a machine learning algorithm from scikit-learn

If you have data, then of course we'll be able to better estimate the parameters of the gamma distribution than by just using user-provided guesses for extreme quantiles. Alternatively (or additionally), we can construct our Monte Carlo sample by sampling from the original data with replacement.

If you have a model, you can skip some of the napkin math because the Y = XB specification is already there. In essence, instead of using Posteriori to describe an arbitrary model, you let the fitting algorithm figure out how important each variable is and how it contributes to the outcome.

If you don't have a model but you do have data that could be modeled, we could have Posteriori do it for you: LASSO regression (linear or logistic as needed) is perfect for this, because it's fast and we can use all variables in the data yet only keep the most meaningful ones. (It is of course up to the user to make sure that the data is representative, includes the most significant confounders and doesn't include any variables that would introduce selection bias. Still, Posteriori is not supposed to be better than careful statistical modeling, it's supposed to be better than not using any sort of probabilistic reasoning at all.)

This could even be extended to multinomial logistic regression: the outcome would be not a single random variable but a random variable for each outcome, representing the prediction interval for each outcome.

So really all this would entail is a little bit of glue code. The bigger question is what a nice interface would look like.

debrouwere commented 8 years ago

Further ideas and requirements for the regression modeling: