Understand the relationship between DERT and CET

bachmannpatrick / CLVTools

R-Package for estimating CLV

54 stars 14 forks source link

Understand the relationship between DERT and CET #166

Closed chenx2018 closed 3 years ago

chenx2018 commented 3 years ago

Hi everyone,

I have some questions for understanding DERT and CET in the model result predict.clv.fitted.transactions. Using my own dataset, I find the CET is always much larger than DERT. More specifically, CET ≈ 5 * DERT in my case. The DERT is the following:

It is a little bit confusing DERT larger than CET since DERT is the integral from T to +Inf and intuitively should be larger than the integral from T to T + some finite time period.

May I know the explicit relationship between DERT and CET? Is it right to say,

As @pschil explained in the #150:

The predicted number of transactions in the prediction period is CET which can be compared against actual.x, the number of observed transactions during the prediction period.

I guess the reason that we can compare CET with actual.x directly is we assume the customer is alive in the prediction period and do not include the discount factor.

I think providing the explicit formula for both DERT and CET will be much more helpful.

Thanks a lot!

mmeierer commented 3 years ago

The discount factor is one of the driving factors for the difference between CETand DERT. For example, assuming a yearly discount rate of 5% (10%),the present value of 1 USD in year 5 is 0.78 USD (0.62 USD). Thus, besides the discount factor value, the length of the prediction period matters when looking at the difference between CET and DERT.

Thus, for some industries, where the interpuchase times of customers are longer and consequently, (a) the prediction period tends to be set to multiple years or (b) you are even estimating the "real" CLV (and not customer value in the next 1 / 3 / 5 years), the difference between CET and DERT might be quite large. What industry is your data from? What prediction period have you chosen? Do you still use the data of the jewelry retailer, mentioned in https://github.com/bachmannpatrick/CLVTools/issues/146, and an "infinite" prediction horizont?

The DERTand CETexpressions for every model implemented in CLVTools are derived in the original papers or additional papers/technical notes (published later on). For the exact mathematical derivation we refer you to these publications, for example,

for the Extended Pareto-NBD model you find them here: https://pubsonline.informs.org/doi/abs/10.1287/mksc.2020.1254
for the Standard Pareto-NBD model you find them here: https://journals.sagepub.com/doi/10.1509/jmkr.2005.42.4.415

For other models, the DERTexpression has not yet been derived. See here: https://github.com/bachmannpatrick/CLVTools/issues/5

The reason for comparing CETwith actual.x is indeed that we this metric does not include the discount factor. This is the only reason. DERT instead has more relevance as a managerial metric (following the net present value approach).

pschil commented 3 years ago

See here for a more-intuitive, customer-level derivation of CET: LINK, especially section 5 and 6

I find the CET is always much larger than DERT. More specifically, CET ≈ 5 * DERT in my case.

Im hypothesising that this is because you choose a too large discount rate. You might have picked the calculations up here as ln(1+d/100): https://github.com/bachmannpatrick/CLVTools/issues/5#issuecomment-685140822 but I was a little sloppy there. If your annual discount rate is 10 percent, the continuous discount rate is ln(1+10/100). This was however implicitly assuming that one is also using annual period definitions. If you define periods differently, you also have to account for that. Hence, if are using weekly periods, you further have to divide it by 52: ln(1.1)/52. I have update my comment there for clarification and also opened an issue to add this to the documentation.

I guess the reason that we can compare CET with actual.x directly is we assume the customer is alive in the prediction period and do not include the discount factor

"can compare CET with actual.x directly is we assume the customer is alive in the prediction period" CET does not assume that the customer is alive in the prediction period, rather it explicitly accounts for dropout. To summarize from the above link: The expected number of transactions E[Y(t)] is the outcome from the Poisson transaction and exponential dropout process. The larger t, the smaller the exponential, and the fewer transactions therefore. When predicting standing at T, we further have to account for the eventuality that the customer dropped out before T (=conditional on being alive until T), therefore CET = PaliveE[Y(t)]. Btw, this is also what P(alive) is used for. P(alive) is not* the probability that the customer will ever again make a transaction with you. At best it is a bad but unfortunately often used approximation thereof.

"do not include the discount factor" The discount factor is included to account for the time value of money, not for the uncertainty surrounding customer churn.

"can compare CET with actual.x" Another reason: CET and actual.x are both from the prediction period T+t. DERT spans to infinity and we do not have actuals until infinity ;)

May I know the explicit relationship between DERT and CET? Is it right to say DERT=CETS()d()

Not really, because CET already accounts for dropout S().

Note, that DERT for the residual transactions from T onwards given as Integral from T to Infinity (as above by you) is conditional for the customer to be alive until T. Therefore, the Integral from T to Infinity actually is DERT(alive at T). To use it as in the package one also has to account for earlier dropout, just like in CET: DERT = P(alive) * DERT(alive at T)

chenx2018 commented 3 years ago

Thanks for replying! @pschil 's hypothesis is AGAIN correct :) I misunderstood the the continuous discount rate and used the wrong one in the predict function. After fix it, the DERT and predicted CLV looks much more reasonable! Also the papers you guys provided are super helpful to understand these conditional expectations!

Back to @mmeierer 's reply:

What industry is your data from? What prediction period have you chosen? Do you still use the data of the jewelry retailer, mentioned in #146, and an "infinite" prediction horizont?

Yes, I still use the jewelry retail data. The mean inter-purchase time is approx. 6-10 months with respect to the different cohort groups.
I set prediction period to be half-year / 1-year / 2-year.
My first goal is to use half-year prediction to target the more valuable customer and do customer segment. Is it OK to use the quantity, say pseudo_CLV := CET*(Predicted.mean.spending), to determine the importance of the customers in half-year horizon?

Besides that, could you please recommend me some literature/textbook on how to use CET, DERT and CLV in the business setting? Thanks again!

mmeierer commented 3 years ago

Good to hear!

My first goal is to use half-year prediction to target the more valuable customer and do customer segment. Is it OK to use the quantity, say pseudo_CLV := CET*(Predicted.mean.spending), to determine the importance of the customers in half-year horizon?

From a pragmatic perspective, this sounds reasonable. Predictions from models other than probablistic models provide you with a similar estimate (as for those approaches managerial expressions like DERT are not available). Although I would recommend to call it "customer value" instead of "CLV". "Lifetime value" usually implies some kind of discounting. And to be even more picky in terms of terminology, with these kinds of analyses we usually focus not on customer value, but the residual customer value (i.e. the value that an existing customer who has already spent money with a business will contribute in the future).

Dan McCarthy provides some more info on discounted versus undiscounted measures of customer value here: CLV framework.pptx. He linked to this file in the following Twitter thread which you might find interesting: https://twitter.com/d_mccar/status/1291450437804199936

Bruce Hardie's webpage is a quite good starting point. You are likely aware of the "Notes" section. However, please also have a look at the "Talks & Tutorials" section. Providing additional materials is on our todo list, but this list is quite long.

If - in the near future - you should have spent some more time on this topic, it would be worthwhile to consider sharing your knowledge in a blog post. I am sure the marketing/data science community would appreciate this.

chenx2018 commented 3 years ago

Thank you so much for your help! I would like to share my experience on using your awesome package in my blog and will let you know :)

page1 commented 3 years ago

@pschil

The expected number of transactions E[Y(t)] is the outcome from the Poisson transaction and exponential dropout process. The larger t, the smaller the exponential, and the fewer transactions therefore. When predicting standing at T, we further have to account for the eventuality that the customer dropped out before T (=conditional on being alive until T), therefore CET = Palive*E[Y(t)].

I'm finding that my actual.total.spending lines up better with predicted.mean.spending * CET * PAlive than without the PAlive. When I plot a histogram of my residuals, if I don't adjust by multiplying by PAlive I find that there is a large number of errors shifted by a constant value, these largely go away when I multiply by PAlive

@pschil are you sure that multiplication by PAlive is not needed?

pschil commented 3 years ago

PAlive is already used in the calculation of CET, namely here: https://github.com/bachmannpatrick/CLVTools/blob/6582677e2c9294f06440fefee8ba3a0c6088d1c7/src/pnbd.cpp#L22-L38 which is called from here: https://github.com/bachmannpatrick/CLVTools/blob/6582677e2c9294f06440fefee8ba3a0c6088d1c7/src/pnbd.cpp#L61-L74

So, CET already includes *P(alive). To judge the merit of CET (and of the PNBD as such) it should be compared against actual.x. If you bring in spending, you are also judging the merit of the GammaGamma spending model at the same time.

Im not quite sure why customer value predictions (CET*predicted.mean.spending) in your case can be improved by scaling-down CET (all Palive <= 1). It might be that the predicted spending is weak (ie the GammaGamma model) or because the estimation period is non-representative of the prediction period, such as when you have fitted during high-season and are now predicting for low-season. The extended PNBD (pnbd with dynamic covariates) might be useful in this case.

bachmannpatrick commented 3 years ago

I'm finding that my actual.total.spending lines up better with predicted.mean.spending * CET * PAlive than without the PAlive. When I plot a histogram of my residuals, if I don't adjust by multiplying by PAlive I find that there is a large number of errors shifted by a constant value, these largely go away when I multiply by PAlive

This sounds like a systematic error introduced by a poor model fit. I would recommend the following three points:

Check your model fit by plotting (plot(estimated.obj)) and checking the model parameters (summary(estimated.obj)). Are the parameters in a feasible range and are the KKTs fulfilled?
Check for extreme values of CET and DERT in your prediction. Unfortunately, in some (extreme) cases, we observe numerical instabilities. See issue #171
Check the values for predicted.mean.spending. Are they within a feasible range? Spending ist predicted using a separate, independent model (Gamma/Gamma model). If your Prices contain outliers, the model fit for the spending part can be challenging.