facebook / prophet

Tool for producing high quality forecasts for time series data that has multiple seasonality with linear or non-linear growth.
https://facebook.github.io/prophet
MIT License
18.03k stars 4.48k forks source link

Forecasting multiple timesteps ahead with leading extra regressors #1162

Closed amw5g closed 4 years ago

amw5g commented 4 years ago

Consider a model to predict movie theater popcorn sales. We have a daily time series over several years. That allows us to get a trend, intra-week and intra-year seasonality. Maybe we have a historical and future calendar of movie release dates categorized by small, medium, & blockbuster openings. We can forecast into the future as long as we are confident in that release calendar.

Additionally, we'd like to include a leading extra regressor: advance movie ticket sales. Some tickets are sold in advance of the screening dates, and some are sold as walk-ups on the day of the screening. But it seems reasonable that if we know how many tickets have been sold in advance, it could be a good predictor of our popcorn sales. At time (t), we have knowledge of the ticket sales volume for time (t+1), (t+2), ... (t+n) days in the future. That is we have a different leading value for every time step into the future. And we can go backwards in time to learn how advance sales at some leading horizon correlate with our popcorn sales dependent variable.

If I'm framing this up correctly, that means for every horizon into the future, I need to have a separate prophet model. Prophet is not a multivariate time series tool, and in fact I don't really want my extra regressor forecasted, I want to use only explicitly what I know, and when I know it. Which means I have to have one model for (t+1) days in the future, one for (t+2) days, ... (t+n) days into the future. Do I have that correctly?

APramov commented 4 years ago

Popcorn sales! What a nice application you have! :)

Let's think together: So the basic prophet model is:

Y_t = f(t) + e(t) (basic prophet formulation). 

It's a (contemporaneous) function of time (I use f(.) here as a shorthand of the trend + seasonality terms, you know, just equation 1 from the paper)

Let us say that Y_t are popcorn sales on day t.

So if I am modeling the popcorn sales of one-step ahead (and consider advanced movie ticket sales) the point forecast will look something like this

E(Y_t+1|Data, parameters) = f(t+1) + beta1 * X_t  +beta2 * X_t-1 + ... + betak * X_t-k+1  

Where X_t, X_t-1...Xt-k are the tickets that I managed to sell in t, t-2, t-1... for the screening date t+1.

This is how I would frame a 1 step ahead forecast - it is a univariate time series as a function of two things

  1. time and
  2. as a function of whatever I could sell for t+1, at time t, t-1, t-2, etc

So, if today I am at t, just after the business day closes, I know how many advanced tickets I have sold today, yesterday, the day before and so on for the screening tomorrow. You don't need to forecast the external regressors, you just take whatever you managed to sell up to date t.

How far you go back in time depends on what the cinema does, I guess. How far ahead do they start selling tickets?

Just an idea might be to aggregate the advanced sells within a whole week or the last 3 days, or maybe a 5 day moving average of those or something. You can experiment with it.

But, to summarize, I think your problem does not require different models for each step of the forecast. If you were modeling different cinemas' popcorn sales, then yes, you might need different models.

Let me know what you think of that.

TheXu commented 4 years ago

Great answer by APramov.

I find this online textbook chapter useful for thinking about using lagged predictors as exogenous regressors in your forecast model, regardless if you use R or not. https://otexts.com/fpp2/lagged-predictors.html

Also, it might be good to look at cross-correlations between popcorn sales and advance movie ticket sales to find the correlations between popcorn sales and different lags of advanced movie ticket sales

amw5g commented 4 years ago

Yes, I think I see what you're saying, and that works for forecasting for the screening tomorrow.

My use case, which isn't actually popcorn sales but something way, way cooler. So much cooler. Is to forecast for multiple time periods ahead. I want to forecast for the screening tomorrow, the next day, and so on until maybe 180 days into the future. I want to end up with a vector of forecasted values, one for each timestep into the future.

I can know what the cumulative advance ticket sales are for tomorrow's screening up through today. That is forecasting y for tomorrow (Oct 3rd) using an extra regressor measured up to today (Oct 2nd). But when I want to forecast y for two days from now (Oct 4th), my extra regressor, which is still at time (Oct 2nd), doesn't represent the same thing. It represents one day less of cumulative sales.

I don't see how a single model will work if my extra regressor measures different things, depending on how far into the future I'm forecasting.

Or did I miss your explanation, @APramov ?

APramov commented 4 years ago

Yes, I think I see what you're saying, and that works for forecasting for the screening tomorrow.

My use case, which isn't actually popcorn sales but something way, way cooler. So much cooler. Is to forecast for multiple time periods ahead. I want to forecast for the screening tomorrow, the next day, and so on until maybe 180 days into the future. I want to end up with a vector of forecasted values, one for each timestep into the future.

I can know what the cumulative advance ticket sales are for tomorrow's screening up through today. That is forecasting y for tomorrow (Oct 3rd) using an extra regressor measured up to today (Oct 2nd). But when I want to forecast y for two days from now (Oct 4th), my extra regressor, which is still at time (Oct 2nd), doesn't represent the same thing. It represents one day less of cumulative sales.

I don't see how a single model will work if my extra regressor measures different things, depending on how far into the future I'm forecasting.

Or did I miss your explanation, @APramov ?

Well in any model, you would need to provide a value for the external regressor for the forecast (assuming you are using external regressors, like here, the advanced ticket sales - X).

If today you are at 2nd of October (t), and you want to forecast popcorn sales for 4th of October (t+2) you would not know how many advanced ticket for 4th of October you would have sold on the 3rd of October (t+1).

But you do know how many advanced ticket sales for 4th of October you have sold at on the 2nd of October, 30th of September and so on.

In any model where you use external regressors, you would need to do either one of those things:

So in your case if on the 2nd of October, you want to forecast the popcorn sales for, say, 180 days later, you either need to

a) use data that you know today (i.e. 6 months prior to your forecast point),

b) or you need to model separately your advanced ticket sales and then plug in the estimate for each day, assuming that your estimated value is correct (this is a HUGE assumption, particularly the further you go in the future). Note that right now prophet would not take the estimation uncertainty of a forecasted regressor(advanced ticket sales) in the final estimation uncertainty related to yhat (which would be the popcorn sales)

amw5g commented 4 years ago

OK, this is also how I see it. And confirms my understanding. I'll need to either fix/forecast the external regressor so that it represents a consistent leading time period.

I don't like the idea of penalizing what the model can know, and I'm not comfortable with forecasting all my external regressors (there are several).

Which leaves me with building a different model for every forecast horizon.

I appreciate the sanity checks, @APramov and @TheXu !

APramov commented 4 years ago

@amw5g , np, if it had helped at least to verify the understanding it is good ;)

I am not quite sure that I still follow your "model for every horizon" and how would you solve the issue with the needed knowledge of the advanced sales in the future. How would your model for the 4th of October look like, for example? Also, do you really need to do forecasts that are so far ahead in the future at t?

amw5g commented 4 years ago

Cool, cool. I'll give a little more :)

Every morning I want to know what the next 180 days will look like; not in a summed aggregate, but at a daily granularity. I want to estimate what the popcorn sales for Oct 3rd will be, and the 4th, and the 5th, etc. Individually. Technically I don't want to know, but the people who pay my salary do, and I want to keep my salary.

My model for Oct 4 (t+2) would take the advance ticket sales for the 4th as of today, and predict what popcorn sales on the 4th will be. For Oct 5th (t+3), I would take the advance ticket sales for the 5th as of today, and predict what popcorn sales for the 5th will be. Etc.

Call my regressor "observed day-before ticket sales". I can't know this until... well, the day before. Therefore it can't be in a model that's predicting 180 days out. It can only be in a model that's predicting exactly 1 time step ahead.

So let's create a regressor that's "observed 180-days-before ticket sales". I can use this predict popcorn sales exactly 180 days out. I could also use it to predict 1 day out. But why would I do the latter? It's a severely compromised predictor compared that what I could be using.

So while I think I grok and agree with your model. I.e., E(Y_t+1|Data, parameters) = f(t+1) + beta1 * X_t +beta2 * X_t-1 + ... + betak * X_t-k+1

I think this falls apart when forecasting out to t+2, t+3, etc. Because while X_t and X_t-1, ... X_t-k+1 will have values, those values will be highly correlated based on how far in advance I'm forecasting. E.g., if I'm forecasting for tomorrow (Oct 3rd), X_t will be measured perhaps in thousands. While X_t for 180 days out (3/31/2020) will be nearly 0.

In order to mitigate that, is it simply a matter of including yet another regressor (call it z ) to express how far out the forecast horizon is. E.g., z in [1,180].

luciaharley commented 2 years ago

Hey @amw5g ! Very late to the party here but I am building a model that sounds exactly like yours (main time series is cumulative "popcorn sales" per day, with n extra regressors representing ramping-up popcorn sales for n days into the future). If you don't mind I'd love to know what ended up working for you here - did you end up building 1 model and forecasting each extra regressor to even out the lead days? Or did you create n different models for the n days into the horizon you were forecasting?

Haridut commented 1 year ago

Hey @amw5g @luciaharley This is a very interesting problem and similar to the one I am working on. Any idea on how to model? I see no other way than to build a model for each of the horizon days. For example, 14 days out ticket sales prediction would require a 14 day out advance bookings as a regressor. Do you think there's a smarter solution?

Haridut commented 1 year ago

@bletham any suggestions on this? Thanks.

amw5g commented 1 year ago

@Haridut I ended up using xgboost with lagged regressors. And built one for each forecast timestep. That is, I had one model for forecasting 1 day out, o e for 2 days out, ..., One for 30 days out. Not ideal, but it worked. Sort of. Happy to describe further. But I made the decision to use a different forecasting approach from prophet.

Haridut commented 1 year ago

@amw5g Interesting, when you say lagged regressors, you mean the advance bookings in your case right? So a model which has 2 day advance bookings, one for 3 day, one for 180 days etc? I am trying to do the same with Prophet though.

amw5g commented 1 year ago

@Haridut for each timestep in the future I want to predict, I want to use all the info available at the time. And of course, I can't use information that I won't know at the time of inference.

For example, if I want to predict 180 days in the future from today, I only have lagged sales up through today. That is, I only know the sales 180 days lagged. I can't yet know the sales at lag 179 or 178 or ...

To get around that, I can either forecast my regressors. OR! I can train an xgboost model that uses the 180-lagged regressor and nothing with a shorter lag. That's what I did.

Which means that I have a 180-forecast model, using sales lagged by 180 days 1 regressor). And I have a 179-day forecast model, using sales lagged by 180 days, and 179 days (two different regressors). And I have a 178-day forecast model, using sales lagged by 180 days, 179 days, and 178 days (3 different regressors). And so on. The models models are trained independently.

Does that clear it up?

luciaharley commented 1 year ago

@amw5g this is very similar to the solution I ended up with as well! Also training a separate tree based model for each horizon date, also using custom infra not Prophet. However I'm interested in how you incorporate the 180-day lagged sales into your 179-day forecast model and so on. Do you have an extra feature for the 180th day that is null for the 179d data, and populated for the 180d data?

amw5g commented 1 year ago

@luciaharley nope, no null features. To simplify, the 180-forecast model has exactly one feature: the lag180 sales value. The 179-day forecast model has exactly 2 features: lag180 sales and lag179 sales. And so on. The 1-day-head forecast has 180 features: lag 180 sales, lag179 sales, ..., Lag1 sales. Nothing is null because the model is both trained and inferred only on features I can be assured I know at inference time. Does that clear it up?

Haridut commented 1 year ago

Good conversation. @amw5g and @luciaharley mine was a much smaller horizon so I ended up training 14 models each one for 14 horizon days. Since I had just 14 horizon days it was manageable. I used them as an external regressor.