Relying on btyd for business decisions

SSMK-wq commented 1 year ago

Thanks for this package and it helps us with lot of interesting info about customer churn, expected purchase orders and revenue etc.

For our business, it is especially useful to know whether the customer will churn in the next few months (3 or 6 or n months). So, your newly incorporated method helps us do that.

However, I would like to know whether we can rely on this package to make real time decisions (in business setting). Meaning, can we trust the outcome from this package for our business operations?.

I asl this mainly because it is in beta and you are worling on some modifications for CLTV calculations etc.

But yes, we really appreciate your help with this new package (which is improved from old lifetimes)

ColtAllen commented 1 year ago

Hey @SSMK-wq,

The short answer is yes, with the caveat of the CLV calculations, which I intend to fix before this library is out of beta.

Many companies have used the legacy lifetimes library over the years, and apart from https://github.com/ColtAllen/btyd/pull/65 and https://github.com/ColtAllen/btyd/pull/67, that functionality is unchanged in btyd. However, lifetimes is not perfect, and the problems I encountered while working on a business project earlier this year prompted me to fork it and create this library. I've posted issues for the known, remaining bugs in lifetimes, which will all be fixed before this library is out of beta.

Most of the work in btyd so far has been in the Models modules, which will eventually replace the legacy Fitters models. The new models take longer to train because they use Hamiltonian Monte Carlo inference instead of Maximum Likelihood Estimation, but are more stable and robust against overfitting, and also provide more informative predictions. I still need to make some tweaks to the API and add plotting support, but even in their current state these models have been thoroughly tested, with code coverage around 98% and reproducing the results from research papers.

My views on the efficacy of Buy Till You Die modeling can be summed up by the statement, "All forecasts are wrong, but some forecasts are useful." I've heard of some companies getting better results with gradient boosted tree models like XGBoost or LightGBM, but this could be hype as those claims tend to contain few details. What details I have seen suggest extensive feature engineering requirements, which will make model deployment and maintenance more difficult.

I believe about 87% of data science projects never make it to production, and overly complex modeling pipelines are a big reason why. A major advantage of these BTYD models is that they have minimal data processing needs, and are easy to diagnose. For the vast majority of business applications, an 80% effective model today is more valuable than a 99% model six months from now.

SSMK-wq commented 1 year ago

Agree Agree Agree especially on your last 2 para about deployment and monitoring. I also believe btyd models are simple amd easy to implement which are good (based on what I have seen in my data). I know people keep pushing AI and some trending algos but just a sinple RFM based thing works for me..

On Tue, 22 Nov 2022, 20:58 Colt Allen, @.***> wrote:

Hey @SSMK-wq https://github.com/SSMK-wq,

The short answer is yes, with the caveat of the CLV calculations, which I intend to fix before this library is out of beta.

Many companies have used the legacy lifetimes library over the years, and apart from #65 https://github.com/ColtAllen/btyd/pull/65 and #67 https://github.com/ColtAllen/btyd/pull/67, that functionality is unchanged in btyd. However, lifetimes is not perfect, and the problems I encountered while working on a business project earlier this year prompted me to fork it and create this library. I've posted issues for the known, remaining bugs in lifetimes, which will all be fixed before this library is out of beta.

Most of the work in btyd so far has been in the Models modules, which will eventually replace the legacy Fitters models. The new models take longer to train because they use Hamiltonian Monte Carlo inference instead of Maximum Likelihood Estimation, but are more stable and robust against overfitting, and also provide more informative predictions. I still need to make some tweaks to the API and add plotting support, but even in their current state these models have been thoroughly tested, with code coverage around 98% and reproducing the results from research papers.

My views on the efficacy of Buy Till You Die modeling can be summed up by the statement, "All forecasts are wrong, but some forecasts are useful." I've heard of some companies getting better results with gradient boosted tree models like XGBoost or LightGBM, but this could be hype as those claims tend to contain few details. What details I have seen suggest extensive feature engineering requirements, which will make model deployment and maintenance more difficult.

I believe about 87% of data science projects never make it to production, and overly complex modeling pipelines are a big reason why. A major advantage of these BTYD models is that they have minimal data processing needs, and are easy to diagnose. For the vast majority of business applications, an 80% effective model today is more valuable than a 99% model six months from now.

— Reply to this email directly, view it on GitHub https://github.com/ColtAllen/btyd/issues/79#issuecomment-1323640067, or unsubscribe https://github.com/notifications/unsubscribe-auth/AHKM54O7M52T26Z4BGBLPADWJS7OJANCNFSM6AAAAAASGEGBU4 . You are receiving this because you were mentioned.Message ID: @.***>

ColtAllen / btyd

Relying on btyd for business decisions #79