ColtAllen / btyd

Buy Till You Die and Customer Lifetime Value statistical models in Python.
https://btyd.readthedocs.io/
Apache License 2.0
114 stars 9 forks source link

Unstable probability of being alive distribution in Pareto/NBD model #86

Open CinelliGucci opened 1 year ago

CinelliGucci commented 1 year ago

Hi there, I am working on estimating the probability of being alive in a pool of customers using the Pareto/NBD model. I found the following a bit weird. The plots show the distribution of probabilities of being alive of my customers. The first is a concentrated distribution while the second one is more smooth. Both of them have been computed using the Pareto/NBD with the same customers and the same l2 coefficient. However, due to the optimization process (Nelder - Mead) the parameters of the model are different (a couple of figures below). Nonetheless, even with very similar parameters (see the third figure) there is this change in the shape of the distribution. Is there any explanation for this?

Screenshot 2023-03-03 at 15 24 16 Screenshot 2023-03-03 at 15 16 02 Screenshot 2023-03-03 at 15 34 39
ColtAllen commented 1 year ago

Hey @CinelliGucci,

Nelder-Mead can converge to non-stationary points per Wikipedia, which may explain the variations in parameters. These models are also quite sensitive to changes in parameter values, as you've seen in your plots.

It's important to note development on this library has transitioned to https://github.com/pymc-labs/pymc-marketing. I'm working on a PR for the ParetoNBD model right now in fact. This will be a probabilistic implementation and can better illustrate the variations you're seeing.

CinelliGucci commented 1 year ago

Thanks @ColtAllen Do you expect, usually, a smoother alive distribution? Using the Pareto/NBD. With the bayesian implementation of the Pareto/NBD (you are working on HMC or MAP?) the problem of instability will be solved? Again, thank you for the work and the patience and time dedicating to this project and to reply this issue. Kindest, Alfredo

ColtAllen commented 1 year ago

Do you expect, usually, a smoother alive distribution? Using the Pareto/NBD.

This depends on the data and choice of penalizer_coef. Leaving penalizer_coef at the default value of 0.0 will led to smoother results. When working with large datasets exceeding 1 million customers, I've had trouble getting Nelder-Mead to converge, but ParetoNBDFitter().fit(*args, fit_method='L-BFGS-B') usually works fine.

With the bayesian implementation of the Pareto/NBD (you are working on HMC or MAP?) the problem of instability will be solved?

The Gaussian Hypergeometric function in the Pareto/NBD likelihood expression is what usually causes instability in this model, but we can work around this in our choice of priors and fitting method.pymc-marketing supports a variety of both, and I already know the recommendations to make for this particular model.

CinelliGucci commented 1 year ago

Thanks @ColtAllen I really appreciate the time you are dedicating to reply me. Do you know if in BTYD or in pymc-marketing there will be the implementation of the Pareto/NBD model with time-invariant covariates? At the moment in btyd I can see only the implementation for the BG/NBD

Thanks in advance, Alfredo

ColtAllen commented 1 year ago

Do you know if in BTYD or in pymc-marketing there will be the implementation of the Pareto/NBD model with time-invariant covariates?

I will be working on this soon in pymc-marketing. All dev work in btyd has transitioned to that library.