CamDavidsonPilon / lifetimes

Lifetime value in Python
MIT License
1.45k stars 374 forks source link

Cross Validation Setup #440

Closed Khalizo closed 3 months ago

Khalizo commented 1 year ago

Hi All,

Please I would like to evaluate the model using k-fold cross-validation. The main metric that I will use to evaluate the model would be RMSE, where I would compare the actual frequency vs the predicted frequency.

Please can how can I do this?

Thanks,

ColtAllen commented 1 year ago

I thought long and hard about adding cross-validation to my fork of this library, but decided it's not an appropriate evaluation method for these types of models.

Cross-validation involves fitting models to different subsets of the dataset, but these subsets nonetheless overlap and the data itself does not change. However, every time we change the time horizon for these CLV models, we change the customer population and how the data is aggregated. New customers will be added as the time horizon is extended, and existing customers will have different values for T even if their frequency and recency values do not change.

So applying a rolling time window for cross-validation would result in completely different datasets and hence completely different models. Cross-validation is also intended to evaluate how well models perform on unseen, out-of-sample data, which I'm not sure these models are designed for.

A Bayesian posterior predictive check is probably the best way to evaluate these models, and will be supported in the successor library for CLV modeling: pymc-marketing,