CamDavidsonPilon / lifetimes

Lifetime value in Python
MIT License
1.45k stars 374 forks source link

CLV lower than monetary_value #355

Closed refaelos closed 3 months ago

refaelos commented 4 years ago

Hi,

I'm using customer_lifetime_value to calculate CLV and I get RMSE of ~0.4 (which I think is good). The problem is that I found users that have CLV lower than monetary_value. I have 2 questions:

  1. Is CLV total revenue or average (my guess is total as it fits real life results better).
  2. If #​1 is true then how is it possible that I get CLV < monetary value.

*** I put monetary_value to be average revenue (total_revenue / days_in_data).

Thanks!

psygo commented 4 years ago

In this library, (prior) monetary_value is the average of total previous revenue divided by the number of purchases the customer has made.

And CLV is the total amount the customer is going to spend after a set date. CLV is not the sum of previous monetary value with the future, predicted one. There is no mandatory condition in reality that would make someone necessarily spend more money in a store than they already have.

refaelos commented 4 years ago

Thanks @psygo

So I got everything wrong ;)

A couple of followup questions:

  1. If I use data for 2 months and predict 2 months with discount=0, should I expect total revenue to be the same?
  2. What are the best suggested ways to test the results?
  3. What parameters can I change to improve results? (If any)

Thanks!

psygo commented 4 years ago

@refaelos, with respect to your questions:

  1. Nope. I don't know what you're thinking about the discount variable but it's simply some kind of inflation or interests compensator, nothing much. Training your model on the first 2 months will make it recognize monetary_value. Predicting (testing) it on the rest of the data will make it calculate CLV for the same clients for the next, say, 2 months. monetary_value and CLV can yield totally different numbers, their sum can is within [monetary_value, +infinity], depending on what the model learns.
  2. There are tons of graphs that help in the visual validation, which are also demonstrated in the main tutorial of the library. As you mentioned, RMSE is also a good metric, which was unfortunately not implemented.
  3. The simpleste parameters that come to my mind are: the penalizer_coef of the model (don't go too high, keep it under 1 for example); the date in which the separation between training and testing occurs.
h-kouame commented 4 years ago

In this library, (prior) monetary_value is the average of total previous revenue divided by the number of purchases the customer has made.

And CLV is the total amount the customer is going to spend after a set date. CLV is not the sum of previous monetary value with the future, predicted one. There is no mandatory condition in reality that would make someone necessarily spend more money in a store than they already have.

When calculating the monetary_value using summary_data_from_transaction_data, it seems like the first transaction is left out. https://github.com/CamDavidsonPilon/lifetimes/blob/master/lifetimes/utils.py#L296 If that's the case, why is that so?

h-kouame commented 4 years ago

In this library, (prior) monetary_value is the average of total previous revenue divided by the number of purchases the customer has made. And CLV is the total amount the customer is going to spend after a set date. CLV is not the sum of previous monetary value with the future, predicted one. There is no mandatory condition in reality that would make someone necessarily spend more money in a store than they already have.

When calculating the monetary_value using summary_data_from_transaction_data, it seems like the first transaction is left out. https://github.com/CamDavidsonPilon/lifetimes/blob/master/lifetimes/utils.py#L296 If that's the case, why is that so?

Never mind. I just realised that the model assumes that the value of the first transaction is 0.