ColtAllen / btyd

Buy Till You Die and Customer Lifetime Value statistical models in Python.
https://btyd.readthedocs.io/
Apache License 2.0
114 stars 9 forks source link

Adding time-invariant covariates #5

Closed juanitorduz closed 2 years ago

juanitorduz commented 2 years ago

Hey! Continuing this project is a great idea. I think a nice addition is to work on merging the open PR for time-invariant covariates https://github.com/CamDavidsonPilon/lifetimes/pull/342 by @meremeev It seems the core and tests are already there.

ColtAllen commented 2 years ago

Hey @juanitorduz ,

I do hope you can join the Zoom kickoff call for this project next Sunday. Information to join is in a Discussion post in this repo as well as here:

https://github.com/CamDavidsonPilon/lifetimes/issues/414#issuecomment-1073247582

I've sent a collaboration invite to @meremeev. Before merging the PR I would prefer to bring the syntax up-to-date and add type hinting per Python PEP 484 standards. Fortunately this can be mostly automated with a single CLI command in MonkeyType, and validated in a pre-commit.sh with mypy.

I do agree merging this PR ASAP for the next package release is a good idea, but thinking more long-term, I'm inclined to deprecate all MLE implementations for model fitting and make this library exclusive to MCMC or ADVI estimations. Why? MLE calculations are currently done with the autograd library, which was deprecated several years ago. Most of the development team moved over to the JAX project, which is a Just-In-Time (JIT) autodifferentiation library and should be considerably more performant. However, it is still in beta and until a 1.x.x version is released, it will experience glitches and other unstable behavior.

Also, the way MLE works is that it estimates the mode (ie, peak of the PDF bell curve) of the likelihood distribution. Only considering a single point in the distribution disregards the bigger picture, and if the customer data violates the statistical assumptions of the model, both these factors combined makes for convergence issues and poor performance. In fact, MLE is completely intractable for the data I'm working with in a company project; I can't proceed any further without a Bayesian approach.

ColtAllen commented 2 years ago

@juanitorduz I've recreated and merged this PR from the base lifetimes repo and am now closing this issue. I look forward to any future notebooks you decide to publish that reference this new covariates model!

juanitorduz commented 2 years ago

I will definitively check it out! Awesome job!