better / convoys

Implementation of statistical models to analyze time lagged conversions
https://better.engineering/convoys/
MIT License
258 stars 42 forks source link

Fit a Beta distribution to the c parameter #5

Closed erikbern closed 6 years ago

erikbern commented 6 years ago

Instead of using bootstrapping to estimate uncertainty of c, just fit a Beta distribution directly

This is probably 10-100x faster although a few more lines of math (lots of gammaln)

Will do the same thing for Weibull and Gamma and then remove the bootstrapping (and the old non-beta models). Will also remove a few other things like sharing parameters etc.

coveralls commented 6 years ago

Coverage Status

Coverage increased (+0.3%) to 71.865% when pulling 494ec60978112f6508124d39fd290e73b1697ff8 on exponential-beta into 7a57e36022735283482b615d61870b098a5bda92 on master.

erikbern commented 6 years ago

Lots of extra code so far, but will result in a net reduction once I'm done

erikbern commented 6 years ago

Something is wacky with the convergence of the Gamma-Beta model, not sure what's up

erikbern commented 6 years ago

Not very happy with how the Gamma-Beta model fitting works, will revisit in the future

erikbern commented 6 years ago

Ok, was able to switch from L-BFGS-B to Nelder-Mead for Gamma, by doing a hacky variable transform to get rid of any bounds. Works really well and was a lot less code!

erikbern commented 6 years ago

something is wacky with the confidence intervals sometimes when you have a small numbers of observations, will investigate

erikbern commented 6 years ago

The bad news is i realized this approach is actually pretty dumb and won't work – you can't fit a prior distribution using MAP. I'm a Bayesian clown.

The good news is I think it's salvageable by factoring out the Beta construction outside of the optimization problem – I'm pretty sure the posterior as a function of c will match a Beta distribution. The other good news is a bunch of the code changes lead to much faster/robust optimization anyway.

erikbern commented 6 years ago

Got it working. This is about 100x faster and more robust than using bootstrapping.

The only downside is that other parameters (k and lambda) are not fit to each bootstrap sample, so you don't know what the uncertainty is. I'm also not 100% sure if the posterior wrt c truly is a beta distribution, but just plotting the marginal probability distribution it definitely matches really well.