Interpreting coefficients

page1 commented 2 years ago

Hi, Would it be possible to help confirm my interpretation?

Background:

The customers are essentially purchasing services for different periods of time (sometimes multiple services with different periods of time), so knowing that a customer has_90 means the customer is less likely to need to make an additional purchase for another ~90 days. From a business stand point getting customers on the 90 day cycle is better than the 30 day cycle (lower ops cost, fewer churn opportunities).

The benefit of the 90 day customer cycle over the 30 day cycle isn't jumping out at me in these coefficients, so I want to make sure I'm understanding the model.

             Estimate Std. Error   z-val Pr(>|z|)    
r             2.01180    0.13523  14.877  < 2e-16 ***
alpha        39.26659    3.61013  10.877  < 2e-16 ***
s             0.33141    0.04619   7.175 7.22e-13 ***
beta          0.16346    0.09183   1.780   0.0751 .  
life.has_30  -1.98285    0.26778  -7.405 1.31e-13 ***
life.has_90  -4.04804    0.27966 -14.475  < 2e-16 ***
trans.has_30  1.11916    0.06226  17.976  < 2e-16 ***
trans.has_90  0.56915    0.05765   9.873  < 2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Optimization info:                  
LL     -25873.6864
AIC    51763.3727 
BIC    51808.1839 
KKT 1  TRUE       
KKT 2  TRUE       
fevals 1599.0000  
Method Nelder-Mead

Used Options:                     
Correlation     FALSE
Regularization  FALSE
Constraint covs FALSE

Based on the walkthrough 𝑟/𝛼 can be interpreted as the mean purchase and 𝑠/𝛽 as the mean attrition rate. My fitted coefficients would suggest that customers make infrequent purchases (This makes sense since customers tend to make purchases on 30 and/or 90 day cycles), and the fact that S > Beta suggests that customers become more sticky as time passes since last sale? (This would make sense since customers that are going to repurchase will tend to do so after they use up their last product/subscription).

The static covariants (which are 1 when observed in the estimation window, else 0) seem to suggest that customers that present with has_30 or has_90 are less likely to make additional purchases in future weeks? Is it fair to say that has_30 customers purchase nearly 2x as often as those with the has_90 flag?

I've investigated using the time varying covariant model to capture the seasonality that exists in this data, but it's fit times are painfully slow, and benefit seemed minimal over long periods of time since customers drift (some repurchase early in the cycle others late).

Let me know if you see anything wrong with my interpretation, or suggestions to explore. Thanks Scott

pschil commented 2 years ago

The Pareto/NBD is designed is for non-contractual, low-involvement transactions where customers can buy at any time (such as retail). Being on a plan sounds a lot like the contractual rather than the non-contractual setting but if the plans are not subscriptions and do not automatically renew and your customers are indeed buying these plans repeatedly, it probably still could work as it would be similar to a non-contractual product. I am now assuming that the Pareto/NBD is indeed appropriate for your use-case.

"From a business stand point getting customers on the 90 day cycle is better than the 30 day cycle (lower ops cost, fewer churn opportunities)."

The Pareto/NBD only predicts purchases or "store visits". All monetary aspects such as margin etc are relevant for predicting spending (or margin) per transaction with the Gamma/Gamma model (ie goes into the "Price" variable in the data). Correlations and other dependencies between purchase events and monetary spending are an intricate issue and currently not modelled. Using the Pareto/NBD and the Gamma/Gamma assumes that transacting and spending are independent. Nevertheless, customer which often buy 90d, will likely also have higher spending/margin per purchase (not because it is modelled but simply because they often buy the 90d).

"S > Beta suggests that customers become more sticky as time passes since last sale?"

Not really. These are parameters of a shape(s)-rate(beta) parametrized gamma distribution for the lifetime process. See here (link) for an illustration of the resulting distribution for different params (change scale to rate in the dropdown). s can be interpreted as the heterogeneity in the customer base with regards to the lifetime process: The lower s, the more hetereogenous the customer base, with regards to the lifetime process. s has no direct interpretation with beta, but a high s needs a high beta to keep s/beta, the average dropout rate, in a sensible area (somewhere in the interval (0, 1] or so) The same statements can be made for r and alpha but with regards to the transaction process.

"The static covariants (which are 1 when observed in the estimation window, else 0) seem to suggest that customers that present with has_30 or has_90 are less likely to make additional purchases in future weeks?"

The interpretation of the covariate parameters is not straight forward. Unlike standard regression models where covariates are usually independent (ie of the form y~x1+x2+x3), covariates here are used as: alpha_i = alpha_0 exp(-gamma z), where z is the covariate data matrix, gamma are covariate parameter vectors and alpha_0 is the reported alpha (39.26). Analogously for the lifetime process.

Hence, each customer's rate parameter (alpha and beta) is "stretched" with a value between (0, Inf), while the shape parameter (r and s) is shared across customers. The covariate parameters then can be interpreted as a rate elasticity: A 1% change in the covariate data, leads to a (param*covdata)% change in the transaction (or dropout) rate.

I am not sure if I understand the data correctly. Are customers only able to have either the 30d or 90d plans? If so, I would design it as a dummy "has_90": All 90d customers receive a 1, and all other customers receive a 0. The 30d customers are then the baseline with only 0s in the variable. For dummies there is no elasticity interpretation but simply two groups:

Customers for which 'has_90=1', given the above formula, alpha_90d=alpha_0 exp(-covparameter_has_90 1).
All baseline customers (the 30d) have z=0 in the above formula, hence exp()=1, and hence alpha_baseline=alpha_0. The relevant parameter is simply alpha_0, the reported "alpha". The average transaction rates for each group then are r/alpha_90d and r/alpha_baseline, respectively. The interpretation is analogous for the lifetime process.

An example with your parameters, but assuming there are 3 groups of customers. The dummy then has levels "has_90" (90d customers), "has_30" (30d customers) and "baseline" (a third group of all other customers). alpha_has_90 = 39.26 exp(-0.56 1) = 22.42 alpha_has_30 = 39.26 exp(-1.11 1) = 12.93 alpha_baseline = 39.26 Average transaction rates of each group: r/alpha_has_90 = 0.089 r/alpha_has_30=0.155, r/alpha_baseline=0.051 And ofc analogous for the lifetime process with s and beta.

Using dummy encoding for groups nearly amounts to fitting separate models with the difference that the shape parameters r and s are shared across all groups.

"Is it fair to say that has_30 customers purchase nearly 2x as often as those with the has_90 flag?"

In terms of purchasing rate (above alphas) yes, but not in terms of effective sales because you would only be looking at the purchasing speed while neglecting any difference in the dropout speed. Fast transacting customers are likely not better if they also leave quicker. The relevant measures which take both of this into account are CET and DERT. You could predict CET (the expected number of transaction in a given future period), map the has_90d and has_30d indicators to the customers and then do some form of analysis (ratios, clustering, etc).

page1 commented 2 years ago

Thank you for the detailed reply @pschil I think the 30 / 90 day issue probably was a bit of a curve ball for this model. I also had many additional explanatory variables available and was having difficulty getting them to consistently fit with this model. Eventually that led to the use of XGBoost for modeling of this problem.

Using XGBoost + SHAP scores I was able to better explain to management what aspects of the customers journey were most important and provide predictions with higher near term precision, and customer specific insight why that customer was or wasn't valuable.

I'll definitely keep this package in my back pocket for scenarios where limited info is available to understand the customer beyond RFM.

bachmannpatrick / CLVTools

Interpreting coefficients #179

Background: