CamDavidsonPilon / lifetimes

Lifetime value in Python
MIT License
1.45k stars 375 forks source link

Definition of Recency #264

Closed psygo closed 5 years ago

psygo commented 5 years ago

Hello, I'm new to this CLV stuff and something has been bothering me quite a lot: the definition of recency. I've seen so many different ones that I'm becoming paranoid about the other definitions of the RFM model and its variations.

In this library, from your code, I've recognized you used the definition Prof. Peter Fader used in his lecture (the difference between the last and the first transaction), however, when writing about it in his BG/NBD paper, he defines it as "when his last transaction occurred", which I assume would be the difference between the observation period end and the last transaction.

The line of your code which shows you chose the former definition (max comes from each customer's transaction history, it is different from the observation period end):

customers["recency"] = (customers["max"] - customers["min"]) / np.timedelta64(1, freq) / freq_multiplier

Two other papers (and Wikipedia) I've found so far seem to confirm the latter definition of the variable, albeit not the 100% reliable sources you would want them to be:

  1. Peker, S., LRFMP model for customer segmentation in the grocery retail industry: a case study.
  2. Zoeram, A., A New Approach for Customer Clustering by Integrating the LRFM Model and Fuzzy Inference System.

If your definition also works, accident or not, it would be worth documenting its difference to others somewhere, preferably on your readthedocs page.

CamDavidsonPilon commented 5 years ago

Hi @psygo,

Fair question, and this is one of the more confusing parts of RFM. Let's use the symbols from Fader's BG/NBD paper.

The definition of t_x actually depends on the definition of T. If we define T be the age of the customer (or more accurately, time between when we first see them to observation period end), then t_x has to be relative to that.

Screen Shot 2019-04-17 at 3 57 33 PM

If T is instead the entire observation period, then we are saying that all customers "started" at the beginning, which isn't realistic (because then there could be an enourmous gap between "starting" and their first purchase).

So when Fader says "when his last transaction occurred", it's relative to the customer's observation time (not our own). This means t_x is exactly customer's last purchase - customer's starting observation time (often their first purchase or signup).

psygo commented 5 years ago

Thank you very much for the quick answer and sorry for the delay.

T and recency are now making a lot more sense -- and I would still recommend to add this note to the docs or somewhere a user can easily see before using the respective functions.

Fader is actually using LRFM instead of RFM then. L (length) is essentially the T anchor to differentiate how old the client is, a reference for the origin in the image 4.1 and how recent the beginning of his interactions is.

eromoe commented 2 years ago

@psygo @CamDavidsonPilon What if we only have truncated transaction ? In this case, we can't get first purchase or signup of all customers , so we treat the first purchase in T_trancated as their first purchase . Seems still working , does RFM perform better than LRFM in this case ?