facebook / prophet

Tool for producing high quality forecasts for time series data that has multiple seasonality with linear or non-linear growth.
https://facebook.github.io/prophet
MIT License
18.56k stars 4.54k forks source link

How to handle covid-19 shock in the fbprophet #1416

Open rajnish-garg opened 4 years ago

rajnish-garg commented 4 years ago

Hi, I am using fbprophet on daily time series with 5 years of data. Due to this covid-19 pandemic most of our metrics are impacted. Because we are using last 20% of data for testing, so it is not able to capture the signals. Is there any recommendation on handling these issues.

rbagd commented 4 years ago

I think the general answer is that you actually can't. Your forecasting model assumes a level of stationarity in your series and for heavily impacted series Covid-19 shock just breaks it. Maybe a model with a regime change might help but we haven't got any past pandemics to learn from. Time series that I am working on all have their seasonalities broken with European population in lockdown, there's not much you can do other than maybe switch to a short-term ARIMA.

Although I believe it might be a lost cause to try to model anything in the short-term, I am wondering what strategies people might use to mitigate the shock after the pandemic. Just set pandemic period to missing data and hope for the best? Can we do better?

bletham commented 4 years ago

+1 for @rbagd's comment, Prophet assumes stationary seasonalities which is probably untrue for most human-related time series right now. In some areas there may be a long enough period of lockdown to check how stationary the seasonalities are within the lockdown period, and potentially fit a separate model just to that period. But there are so many frequent external shocks to the series (policy changes, etc.) that if things have stabilized, they probably won't stay stable for long.

As for after - I think we'll have to see what will end up working. Just throwing out the pandemic period as unuseful data could work but will really depend on what things look like after the pandemic clears up, and how similar they are to how things were before.

benmwhite commented 4 years ago

Just throwing out the pandemic period as unuseful data could work but will really depend on what things look like after the pandemic clears up

@bletham Or input the quarantine/lockdown periods as "holidays" in the model for full dark irony

mik1893 commented 4 years ago

I guess since the covid-19 pandemic has numerous impacts which also vary on the nature of what you are trying to forecast and the business related to it... it's pretty much impossible to define a standard set of rules to insert in a statistical modelling library...

lazaronixon commented 4 years ago

great results here https://medium.com/@andrejusb/covid-19-growth-modeling-and-forecasting-with-prophet-2ff5ebd00c01 I used this model on my application https://profetadocorona.herokuapp.com

bletham commented 4 years ago

I had a discussion with some other forecasters on this question last week, and one thing that came up that I wanted to mention for other people trying to salvage their forecasts is that you an specify different seasonalities for different periods of time, in particular here it is possible to have e.g. weekly seasonalities that are different "pre-corona" and "during-corona", like here: https://facebook.github.io/prophet/docs/seasonality,_holiday_effects,_and_regressors.html#seasonalities-that-depend-on-other-factors . That could be helpful for some time series.

yuzuhikorunrun commented 4 years ago

@bletham Hello Ben, thanks for the answer. One question though, in retail the corona shock is always playing negative on the sales, however by adding the pre-corona and during-corona seasonalities, prophet seems to mistakenly think during-corona, the sales peaks fly high.

Not sure if I am doing this righ, but can you please weightin here? Thank you!

Here're my parameters:

def is_Not_corona_period(ds):
    date = pd.to_datetime(ds)
    return (date.year != 2020)

temp['post_corona'] = ~temp['ds'].apply(is_Not_corona_period)
temp['pre_corona'] = temp['ds'].apply(is_Not_corona_period)

m3 = Prophet(holidays=holidays,
             interval_width = 0.99, # default is 80.
             holidays_prior_scale=0.25,
             changepoint_prior_scale=0.5,
             seasonality_mode='multiplicative', 
             yearly_seasonality=10, 
             weekly_seasonality=False, 
             daily_seasonality=False)

m3.add_seasonality(name='annually-pre-corona', 
                   period=365, 
                   fourier_order=10, 
                   condition_name='pre_corona')

m3.add_seasonality(name='annually-post-corona',
                   period=365, 
                   fourier_order=10, 
                   condition_name='post_corona')

and the resulting flyhigh:

3 4 6 1 2
shoaibkhanz commented 4 years ago

@bletham I want to throw an idea into this mix, which I have implemented in linear regression models I maintain at work. Now this isn't using Prophet but simple linear regression but I believe this can be extended.

I am using log-log linear regression model to estimate price elasticities (didnt use ARIMAX as that might reduce the price elasticity effect) of customers and then predict into the future, since the covid period my models deteriorated considerably. The challenge I had was that training another set models for covid would mean going back to model risk and agree on the coefficients(price elasticities) and the impacts it will have also not to forget feature selection.

Thus, since I couldnt change coefficients, I introduced moving average into my linear regression models (an idea borrowed from Time Series), what this means is that I am adjusting my intercept after fitting my model.

here are the steps I went through:

I have models in R thus here is the code of what I did.

  train_error = train_actual - train_preds_
  test_error = test_actual - test_preds_

  train_preds_ = train_preds_ + lag(as.vector(rollmean(train_error,k = 2,fill = FALSE,align = 'right')),default = 0)
  test_preds_ = test_preds_ + lag(as.vector(rollmean(test_error,k = 2,fill = FALSE,align = 'right')),default = 0)

What do you think about this approach ? This has improved my models performance considerably as it’s adapting and learning from its past errors.

bletham commented 4 years ago

@yuzuhikorunrun the problem there is that you're fitting a yearly seasonality (annually-post-corona) with quite a bit less than a year of data. The 2020 seasonality for June on is thus totally unconstrained, and in this case is blowing up in a really bad way. That's because by default there is very little regularization on the fitted seasonalities. You could specify the prior_scale in add_seasonality to something small like 0.1 which would clamp down on that resonance, but really it probably doesn't make sense to try to predict yearly seasonality for 2020 distinctly from previous years. In my earlier post I was thinking more like weekly seasonality, where we do have multiple weeks of during-corona and can reasonably fit the seasonality.

bletham commented 4 years ago

@shoaibkhanz I guess what you're doing is along the lines of fitting a model to the residuals (in this case the model is a rolling mean). That makes a lot of sense, thanks for sharing!

shoaibkhanz commented 4 years ago

Thanks @bletham , I am glad that you found that useful.

yuzuhikorunrun commented 4 years ago

@bletham Hello thank you for the prompt reply, I appreciate this. Wondering if I can set it as monthly seasonality since my data are monthly data and I only have Jan-March upon this time. thank you!!!

bletham commented 4 years ago

@yuzuhikorunrun Monthly seasonality would mean a cycle within a month, and so wouldn't be appropriate here. Unfortunately with monthly data I don't think there is a whole lot that can be learned with just a few months of data. You'll have to let the trend component capture the chang due to COVID; because it is a change right at the end of the time series, you might need to increase the changepoint_range to something like 0.95 (see https://facebook.github.io/prophet/docs/trend_changepoints.html for more details).

yuzuhikorunrun commented 4 years ago

@bletham Unfortunately, setting changepoint_range to 0.95 or 0.9 does not improve the model performance and it failed to capture the sudden decreased sales due to COVID shock in Feb and March (not so much impact in Jan). I do have some interesting (or just lucky) findings and I'd love to hear your thoughts on this.

I have monthly data up to March-2020, and I mistakenly added monthly seasonality to my parameters (my data has multiplicative yearly trend)

def is_corona_period(ds):
    date = pd.to_datetime(ds)
    return (date.year == 2020)

temp['post_corona'] = temp['ds'].apply(is_corona_period)
temp['pre_corona'] = ~temp['ds'].apply(is_corona_period)

m5.add_seasonality(name='monthly-pre-corona', 
                   period=30, 
                   prior_scale = 0.1,
                   fourier_order=10, 
                   condition_name='pre_corona')

m5.add_seasonality(name='monthly-post-corona',
                   period=30, 
                   prior_scale = 0.1,
                   fourier_order=10,
                   condition_name='post_corona')

then, boom, it actually performs better in predicting the Feb and March data (after adding Jan-2020 data to train):

Before-Adding-wrong-seasonlity

Capture Capture1

Also notice here my extra regressor's impact is positive which is expected.

Capture2

beta-parameters:

Capture

After-Adding-wrong-seasonality

COV1 COV2 COV3

Notice my extra-regressor here plays negative instead which is not expected...

COV4

a quick comparison with true-value, predict-values.

COV5

beta-parameters:

Capture1

Do you think this is purely luck? And any idea of what might be going on here to make this lucky improvement happen?

Thank you again for making this amazing tool.

-Best.

samrudh commented 4 years ago

@shoaibkhanz : It is an interesting idea to have a rolling mean approach to learn the residuals on top of the current model. It works well as a retrofit approach without doing much changes to the existing set-up. But I wonder rolling means are sufficient. Because by nature, means are sluggish in response and do not adapt to the changes quickly. When things start improving, the model might be slower to respond?

Would it be a good idea to add one more term like the difference of rolling mean errors? say between k=2 and k=3? trainpreds = trainpreds + lag(as.vector(rollmean(train_error,k = 2,fill = FALSE,align = 'right')),default = 0) + {difference between( rollmean k=2, rollmean k=3) }

EveTyro commented 4 years ago

@shoaibkhanz - how do you fit the rolling means into prophet model?

mikocml commented 4 years ago

I am using fbprophet for a monthly sales prediction problem (5 years of historical data with 12 months ahead prediction) and have been researching on options on how to deal with covid-19 shock.

This article on how to forecast demand despite COVID on Medium summarise three options and here I shared their equivalent fixes for fbprophet:

  1. Flag Outliers (Simple Solution): simply flag the outliers and remove/replace them which is equivalent to throwing out the data
  2. Event Forecasting (Better Solution): leverage on holiday or event effect, this is tricky as the policies and lockdown periods are different from countries to countries, from months to months
  3. Use External Drivers (Even Better Solution): use additional external data or economic indicator as regressor

I tried the third option and used PMI (purchasing managers' index) which is a leading economic indicator as additional regressor. It works perfectly in absorbing the covid-19 shock in Q2 this year and it is able to minimise the impact of pandemic on seasonality.

Update: However, the limitations of such indicators are that they are usually available for short term and hence the third option may not work for long term as we need reliable future values. Hence, an alternative is to combine and transform this indicator (that can reflect covid shock in your data) into a binary regressor based on outlier detection approach to specify different seasonalities e.g is_covid, is_not_covid. In this case, the future values can be FALSE assuming is_not_covid.

soliverc commented 3 years ago
  1. Use External Drivers (Even Better Solution): use additional external data or economic indicator as regressor

Regarding @mikocml answer above, I tried to add Google's mobility reports as a regressor to my data (https://www.google.com/covid19/mobility/)

It unfortunately did not make much difference for my retail data, but might me useful for others. My data is a large supermarket chain where sales increased massively during lockdown.

It has been over a month since the last reply. Has anybody come up with other solutions?

Another source of external data could be Business Confidence Index for your country: https://data.oecd.org/leadind/business-confidence-index-bci.htm#indicator-chart

Only comes in monthly data though. Can this be applied to daily data? It will be one single figure for a whole 30 days.

ghost commented 3 years ago

I have another interesting one that challenges more how Prophet captures and defines the trend. It resembles a bit the question in #697. Basically, my data is Store order data for a large food retailer.

Model :

COVID Struggle

What I am struggling to capture is the structural changes in the trend and how we can model upcoming changes. Specifically:

  1. COVID changed structurally how we eat (let's say + 14%) vs before COVID
  2. This will remain after f.e. restaurants reopen here in Europe (let's say to a +10% vs before COVID)

Two different elements I struggle to correctly model

Issue 1: Accurately capture the past structural change

When looking at my historical data. Prophet captures

Option I see to help model capture this better :

Ideal would be something like #1789 as that would really capture the reality best (and tackle issue 2 as well). Implementing something as #705 is outside of my skill range

Don't know what other's opinions are.

Issue 2: Accurately capture the future

Our expectation is that the trend will drop again a couple of %-points after restaurants reopen. No idea how to capture this structurally. Options I see:

  1. Before actual drop : In output overwrite "Trend" and recalculate yhat
  2. After actual drop : keep changepoint range up-to before the drop and do that until model can capture the trend correctly on its own

Regressor route is not an option as model has never seen it. Again #705 would be nice to allow model to capture drop and then force a flat trend after, but still outside my skill range.

Any idea's?

Thanks

entzyeung commented 8 months ago

https://facebook.github.io/prophet/docs/handling_shocks.html

this may help, cheers