model validation and performance measurement with unseen data

CamDavidsonPilon / lifetimes

Lifetime value in Python

MIT License

1.45k stars 374 forks source link

model validation and performance measurement with unseen data #449

Closed HCM00 closed 3 months ago

HCM00 commented 9 months ago

Hi All

I am using Lifetimes to calculate Customer Lifetime Value (CLV), and it has proven to be very useful. However, I have some questions regarding model validation and how performance is measured with unseen data.

What are the best practices for model validation in Lifetimes? Is there any particular approach recommended?

Is there a specific metric or method for evaluating model performance when provided with unseen data?

Additionally, I am aware the new library for CLV calculation called pymc-marketing. I am curious to know if there are similar capabilities for validation within that library. If you have any insights on this, I would greatly appreciate your input.

Thank you

ColtAllen commented 9 months ago

Hey @HCM00,

I'm one of the contributors to pymc-marketing. We're still working on adding validation capabilities at this time, but for modeling concept drift on out-of-sample data in practice, I like to use the underlying data from plot_cumulative_transactions:

Use the wasserstein distance as a metric for the spread between cumulative predictions and actuals. You can also weigh this metric by the inverse of days since last visit (1/(T - recency)). Trigger a retraining once the metric exceeds a user-defined threshold. Hope this helps.

HM9N commented 9 months ago

Hey @HCM00,

I'm one of the contributors to pymc-marketing. We're still working on adding validation capabilities at this time, but for modeling concept drift on out-of-sample data in practice, I like to use the underlying data from plot_cumulative_transactions:

Use the wasserstein distance as a metric for the spread between cumulative predictions and actuals. You can also weigh this metric by days since last visit (T - recency). Trigger a retraining once the metric exceeds a user-defined threshold. Hope this helps.

I'm curious about the reasoning behind weighting this metric by days since the last visit (T - recency). Could you please provide more context or details on how this weighting enhances the model evaluation and why considering the recency of data is relevant in this case? Thank you.

ColtAllen commented 9 months ago

I'm curious about the reasoning behind weighting this metric by days since the last visit (T - recency). Could you please provide more context or details on how this weighting enhances the model evaluation and why considering the recency of data is relevant in this case? Thank you.

My original post has been edited: I meant to say inverse of time since last visit. It's optional of course, but you may want to prioritize recent purchases since those customers are likely still active.