better / convoys

Implementation of statistical models to analyze time lagged conversions
https://better.engineering/convoys/
MIT License
258 stars 42 forks source link

Rename the cdf method to predict to be more in line with scikit-learn #125

Closed erikbern closed 4 years ago

erikbern commented 4 years ago

The cdf name was an attempt to make it more similar to the probability distributions in scipy.stats but in retrospect that was confusing since the convoys models do not represent probability distributions – they are more like regression models that output something between 0 and 1.

This deprecates the old method names but keeps them for the time being.

EDIT: also splits the prediction method into predict vs predict_ci along the lines of https://martinfowler.com/bliki/FlagArgument.html

nicku33 commented 4 years ago

I agree, the original was a misname. This makes it more in line with statsmodels Model.predict() https://github.com/statsmodels/statsmodels/blob/master/statsmodels/regression/linear_model.py#L354

stphnma commented 4 years ago

I'm actually fine with the name cdf, since it made sense statistically, but I get why it could be confusing, so not against this change.

My more opinionated thought -- the predict/cdf function should output a list of samples for the case when MCMC is used, instead of defaulting to the CI. I think more weight should be on the end-user to make their own CI is they need, plus having the samples is use for posterior comparison between cohorts. I think @mbratten implemented a update here, but IMO it makes sense to incorporate it into the overall predict function

mbratten commented 4 years ago

Not opposed to the change but one thing I liked about the name cdf was the fact that it implied that you could have the model return the functions entire range if you just supplied a vector t rather than a singular value. This actually helped me understand what was going on a lot more since initially when I read through some of the documentation my thought was if there was a predict function it would return the value C as it was sort of the logical thing the model was attempting to capture.

rickshapiroBetter commented 4 years ago

My more opinionated thought -- the predict/cdf function should output a list of samples for the case when MCMC is used, instead of defaulting to the CI.

This functionality would be useful. I'm currently trying to get confidence intervals for difference in predictions and I have to redo the cdf function to get them. Would be nice to have this option!

erikbern commented 4 years ago

This functionality would be useful. I'm currently trying to get confidence intervals for difference in predictions and I have to redo the cdf function to get them. Would be nice to have this option!

I think this is what @mbratten did in https://github.com/better/convoys/pull/123

When I think about it, I'm not sure if that should be called predict_posterior though. Maybe I'll rename it posterior_samples or similar

coveralls commented 4 years ago

Coverage Status

Coverage decreased (-2.2%) to 95.433% when pulling bef7c5a9ce6b173f9867cfef31029ad61397c263 on predict into 46c8656d92f38d8446b0c5b497487c4c0fff9682 on master.

erikbern commented 4 years ago

I'm tempted to remove the ci flag on the predict method and make it a separate (third) method besides predict and predict_posterior. What do you think?

Maybe I'll do that in a separate PR

erikbern commented 4 years ago

From docs after this:

image