Cross Validation Setup - Githubissues

I thought long and hard about adding cross-validation to my fork of this library, but decided it's not an appropriate evaluation method for these types of models.

Cross-validation involves fitting models to different subsets of the dataset, but these subsets nonetheless overlap and the data itself does not change. However, every time we change the time horizon for these CLV models, we change the customer population and how the data is aggregated. New customers will be added as the time horizon is extended, and existing customers will have different values for T even if their frequency and recency values do not change.

So applying a rolling time window for cross-validation would result in completely different datasets and hence completely different models. Cross-validation is also intended to evaluate how well models perform on unseen, out-of-sample data, which I'm not sure these models are designed for.

A Bayesian posterior predictive check is probably the best way to evaluate these models, and will be supported in the successor library for CLV modeling: pymc-marketing,

CamDavidsonPilon / lifetimes

Cross Validation Setup #440