Forecasting airline travel with data pre-covid, during-covid, and 'post'-covid

facebook / prophet

Tool for producing high quality forecasts for time series data that has multiple seasonality with linear or non-linear growth.

MIT License

18.47k stars 4.53k forks source link

Hi,

I am trying to forecast airline travel and I have 3 years worth of data (2019-2021), spanning from pre-covid until now. I have read #1416, #1726, and #1595 and I've tried the following, with respect to observing more stable confidence intervals and forecast estimates: 1) Extending my changepoint_range: I extended it to 0.9, but my forecasts seem more unstable than the baseline model where I let Prophet auto determine everything. 2. Specifying changepoint dates: After taking a look at the auto detected changepoint dates, I included the ones that Prophet detected in addition to more changepoint dates I decided to experiment with after the 80% mark. This reduced my confidence intervals for the forecasts slightly, but I wouldn't say it is satisfactory. 3. Removal of outliers: about 1-2% of the data, out of 1000 samples. This slightly reduces the variance.

Let's say I consider the main effects of COVID to be from March 2020 - Sept 2020. My understanding from the previous issues I have read is that we can experiment with adding these dates as a holiday, but my forecasts continue to interpret these dates as a seasonal occurence, for the year 2022, 2023. I do see a more stable pattern in airline travel towards the end of 2020, and I would like the model to favor those trends, rather than the airline travelling that occured pre-Covid.

Is there any advice on how I may produce stable forecast estimates, atleast for the second half of 2021 or 2022? If i do remove a huge chunk of data pre-covid, would it even make sense to provide covid -affected airline data and then post-covid data only? I thought perhaps the model would be over optimistic on the growing trend in airline travel.

I am new to using Prophet, my apologies if I have misunderstood something.

Thank you!

Hey there, could I just confirm, when you set March 2020 - Sept 2020 as "holidays" for Prophet to model, did you do so using the lower_window and upper_window columns? e.g. if there was a huge dip from March 2020 to May 2020, we'd set a holiday as at March 2020 and set upper_window: 60 to represent the time series staying low over 2-month period.

Other than that, I think you've taken all the reasonable steps. Some other ideas which I'm not sure will work or not:

We could try tweaking the prior scales on each of the components. i.e. changepoint_prior_scale, seasonality_prior_scale, holidays_prior_scale. In general, larger values here fit each component more to historical data, and lower values regularize the components towards a flatter shape. So for example, if we wanted the model to pick up the dips in different months as holidays moreso than month-of-year seasonality, we would increase holidays_prior_scale and decrease seasonality_prior_scale. You can find the default values for these parameters here: https://github.com/facebook/prophet/blob/b75844e07c3b09bf3bc383c9d43241c554dd251b/python/prophet/forecaster.py#L89-L91
There seems to be an inherit trade-off between these two ideas: "more stable pattern in airline travel towards the end of 2020, and I would like the model to favor those trends, rather than the airline travelling that occured pre-Covid" vs. " I thought perhaps the model would be over optimistic on the growing trend in airline travel". Perhaps this is a matter of presenting two different scenarios, e.g. "what if the current growth continued" vs. "the forecast based on the historical data we have"?
Though I am curious to know how Prophet is forecasting the trend here. I would have thought that by adding changepoint dates towards the end of 2020 that it would be able to pick up the positive trend. Perhaps it's worth trying to increase changepoint_prior_scale to see if Prophet can be more aggressive with the size of the trend changes?

facebook / prophet

Forecasting airline travel with data pre-covid, during-covid, and 'post'-covid #1983