CamDavidsonPilon / lifetimes

Lifetime value in Python
MIT License
1.45k stars 374 forks source link

Using lifetimes to repeatedly score entire customer base #345

Closed alexHeu closed 3 months ago

alexHeu commented 4 years ago

Hi,

first of all I want to say thanks for providing this great library. After reading the papers and a few examples I am not yet sure how to use these kind of models in practice as the papers/example focus mostly on just a single cohort of customers.

In my use-case, I need to score the entire e-commerce customer base in regular intervals.

So my first questions is: Do I have to define cohorts or can I just treat my complete customer base as one large cohort?

If cohorts need to be defined, a few other questions arise. I have data from the beginning of 2017 until today. I assume that a cohort is defined as all customers that joined within 6 months and I need at least 1.5 years of data to train a good model.

This leads to following cohorts:

Cohort 1: Joined in Q1/Q2 2017 Training data: 2017.1 - today

Cohort 2: Joined in Q3/Q4 2017 Training data: 2017.6 - today

Cohort 3: Joined in Q1/Q2 2018 Training data: 2018.1 - today

Cohort 4: Joined in Q3/Q4 2018 Training data: 2018.6 - today

Cohort 5: Joined in Q1/Q2 2019 Not yet enough training data

Cohort 6: Joined in Q3/Q4 2019 Not yet enough training data

  1. What can be done about cohort 5 and 6? Is it ok to apply the cohort 4 model on them or do I need another modeling strategy to score new customers?
  2. Do I have to retrain each cohort with the most recent data if I want to update the CLV value each month?

How did you guys handle this in practice? It seems quite tedious to automate this because the cohorts are getting more and more.

Thanks for any help that you can provide :)

ObiWanFoley commented 4 years ago

The best answer is that it depends on who Cohort 5 and Cohort 6 most closely resemble. You have to take into consideration things like seasonality, offerings, promotions etc... if you want to use a previous model to glean insight into more recent customer cohorts. The beauty of using these types of models is that you have "like data" because you will always have the first X days of a cohort that you want to compare to and then use X + n days (weeks, months etc) to forecast what you think will happen with your new cohort.

psygo commented 4 years ago

Where did you see that we (need to) create different models for different cohorts in this library?

From what I understand, there is no need for them. Once the model gets a bit of data, it will extrapolate the trained parameters to new customers, as they would have similar behavior — I know treating them as cohorts would be more precise. New customers will be represented by frequency = 0, since they have only 1 purchase or 0 repeat purchases.

ObiWanFoley commented 4 years ago

If you were referencing my comment perhaps i wasn't too clear...we don't (need to) create different models for different cohorts in this library. The question was posed as a matter of best practice within industry. The true best practice will require that cohorts be created outside of the library and a model created for broad customer types. While you would also want to see your customer base as a whole and can get some very good information from that, in practice you will get better predictions of customer behavior if they are split into cohorts initially before populating separate models. This is evident in cases where you have A/B testing and then pick a winner. You will want to see how the new cohort (on say a new landing page) matches up against customers who utilized a previous landing page....while you will want to see what the new page does to your overall customer LTV within a master model, you will be in a much better position to make predictions about customers entering on the new page by using a model trained specifically using customers that used that new page. Again....this is not a question of if this is a requirement or if it is happening within the library itself...the answer to both of those questions is NO...if the question is whether it is common and more likely a better business practice to do these things then the answer is a resounding YES, so long as you understand WHY you are doing it and when it should be done. (which is not for quarters or individual customer periods....generally ONLY for things that are affecting the customer base as a whole and will potentially change the makeup of what will define a 'good customer')