facebook / prophet

Tool for producing high quality forecasts for time series data that has multiple seasonality with linear or non-linear growth.
https://facebook.github.io/prophet
MIT License
18.26k stars 4.51k forks source link

How to handle data distribution shifts #1997

Open luistelmocosta opened 3 years ago

luistelmocosta commented 3 years ago

Hello, I am using fbprophet to forecast the sales of a store. The structure of the data is hierarchical, and I am predicting at the highest level.

Example: The store can sell Books, Movies, Toys. Books can be Action, Adventure, same with movies and toys. And then you have the actual product.

I am forecasting the revenue at the Books, Movies, Toys level. I would like to know how to handle a scenario where a new book is added to the store and suddenly it starts to generate a high volume of sales thus, increasing the revenue for that unit.

Any research on this? Is there any common approach?

Thank you!

tcuongd commented 3 years ago

Hey there! That's a very interesting question - could I just confirm do you:

There's no standardized way to account for 1) (at least within Prophet), so that might require manual adjustment to the forecast.

However if your question is more around 2), I think it depends on how much historical data you have where the spikes have occurred. e.g. has a popular Toy been introduced in the past that caused a spike in sales. If this has occurred, it might be good to break down the sales volumes by category, then forecast the individual categories. Prophet assigns trend changepoints (see here) to past data, and incorporates this into the trend uncertainty. This is represented in the yhat_lower and yhat_upper values - you can think of yhat_upper as the best case scenario, where most of the new items we introduced will cause spikes in sales. Remember that the more historical data Prophet is given around past "spikes", the better it will be able to learn how often they occur and how big the spikes might be.

If you're in a situation where there's not much historical data on spikes, one trick you could employ is to create synthetic (fake) data based on what kind of popular items could be introduced and how much they might improve sales. Keep in mind that this requires domain knowledge and your results will be heavily influenced by the assumptions made.

luistelmocosta commented 3 years ago

It is more aligned to 1. I only have knowledge of many new books added since April 2021, but I cannot quantify the number of new books, due to logistic limitations.

Some details on the issue: the time series has monthly data points from January 2018. The model works very well till March 2021, but new books added since April 2021. Thus the revenue since April 2021 shifted and having a higher error.