bachmannpatrick / CLVTools

R-Package for estimating CLV
54 stars 14 forks source link

DERT for BG/NBD model - Mathematical derivation and implementation #5

Open mmeierer opened 4 years ago

mmeierer commented 4 years ago

Do and document mathematical derviation of DERT for BG/NBD model and implement it

mmeierer commented 4 years ago

For further information see this thread: Download

URL: http://kpei.me/blog/?p=921

pschil commented 4 years ago

Residual CLV is generally defined as

where t=transaction rate, S=survivor function, d=discount rate (Fader, Hardie, Shang, 2010).

Solving this integral analytically for BG/NBD might be a mess, but could we not, given the relevant expressions, still compute it numerically? The GSL library supports calculating semi-infinite integrals.

hermandr commented 3 years ago

May I ask if continuous_discount_factor = 0.1 (10%), is that 10% per year?
When I use weekly data or daily data, do I need to convert the discount factor or use it as Discount factor per year?

I found the predicted sales can be greater than predicted CLV, which seems strange. Also the occurrence of it differs between pnbd and bgnbd models. Peter Fader's reply above alluded to BG/BB model, will that be implemented later?

Herman

pschil commented 3 years ago

I found the predicted sales can be greater than predicted CLV, which seems strange.

I assume "predicted sales" is the output from the Gamma-Gamma spending model which is the "predicted mean spending per single transaction" and not the total spending that is still expected. We have clarified that in the prediction output in version 0.7.

CLV is the E(RLV) expression defined further up this thread (here) where E(Spending)="predicted mean spending per transaction" (from the Gamma-Gamma model) and the integral from T to Infinity is what is called "DERT" (Discounted Expected Residual Transactions). DERT can be described as "all expected future transactions, discounted to the end of the fitting period". Therefore, CLV="mean spending per transaction" * DERT.

It is absolutely possible that the CLV for some customers is smaller than the expected spending, namely whenever DERT < 1. Example: A customer that only bought one time 500 periods ago is likely not going to transact with your company in the future, and all the transactions you still expected from that customer in the future, discounted, (=DERT) are likely to be very small if not even 0. CLV here is not just the future spending per transaction, but also accounts for the probability that the transactions actually happens (the customer is still alive) and for the time value of money.

May I ask if continuous_discount_factor = 0.1 (10%), is that 10% per year?

It is the discount rate that assumes that compounding happens continuously and not in discrete periods such as years, weeks etc and you might know this concept of continuous compounding from from the finance literature (net present value, option and asset pricing theory, etc). The discount rate enters the DERT expression in the integral as d(t-T) and therefore, because in the integral it is infinitesimal small, the continuous discount rate has to be used. It is: d_annual = exp(d_cont)-1 and the reverse: d_cont = ln(1+d_annual) So, if you have an annual discount rate of 10%, the continuous discount rate would be ln(1+0.1)=0.095. EDIT: If your periods are not defined as years, you additionally need to account for this by dividing by the number of periods per year. If your periods are monthly, d_cont = ln(1+d_annual) / 12 or if your periods are weekly, d_cont = ln(1+d_annual)/52.

The choice of discount rate is a delicate matter because it has a large influence on CLV through DERT. Common are for example the cost of capital for the whole company (WACC), a required Internal Rate of Return (IRR), or a discount rate that accounts for project specific risks.

Differs between pnbd and bgnbd models.

We have no derivation of the BGNBD model's DERT expression and can therefore also not calculate CLV. In the latest version 0.7, the prediction for bgnbd() does not contain columns CLV and DERT anymore. Note though that the BGNBD model is not really recommended to be used anymore. In the BGNBD, the opportunity to die comes with/right after every transaction which means that more frequent buyers have more opportunities to die and therefore have a smaller probability to be alive (PAlive) at the end of the fitting period. That more frequent buyers are less and less likely to be alive and therefore are also expected to make less future transactions, is obviously quite wrong...

Peter Fader's reply above alluded to BG/BB model, will that be implemented later?

The BGBB model is on our list. It is for the discrete time setting or for very, very regular buyers in which case the poisson distribution in pnbd may collapse. Unfortunately, it additionally requires discrete "transaction opportunities" that need to be defined when creating the clvdata() object. We currently simply do not have the resources to implement these changes to the creation of the clvdata object.

hermandr commented 3 years ago

Thank you for the explanation.