facebook / prophet

Tool for producing high quality forecasts for time series data that has multiple seasonality with linear or non-linear growth.
https://facebook.github.io/prophet
MIT License
18.32k stars 4.52k forks source link

Adding Regressors Giving Outlier Results #1586

Closed nknauer closed 4 years ago

nknauer commented 4 years ago

I posted this question in stackoverflow as well and can't figure out why this is happening. https://stackoverflow.com/questions/62971261/r-prophet-time-series-prediction-with-regressors-has-outlier-results

Basically I am trying to predict the future stock price of the ticker: R.

When I try this with the normal prophet function without regressors, it gives me a reasonable prediction. However when I add in a regressor, the output has 2 low predictions which I don't understand why it's happening. The output is saying weekly seasonality is causing this but my predictions for the future dataframe of the regressor only does not fluctuate which is making this output confusing. Below is my code. Any help would be great, thanks.

library(BatchGetSymbols)
library(quantmod)
first.date <- '2017-10-17'
last.date <- '2020-07-10'
freq.data <- 'daily'
ticker <- 'R'
first_batch <- BatchGetSymbols(tickers = ticker, 
                           first.date = first.date,
                           last.date = last.date, 
                           freq.data = freq.data,
                           cache.folder = file.path(tempdir(), 
                                                    'BGS_Cache') )# cache in tempdir()

stock_prices<- data.frame(first_batch$df.tickers$ref.date, 
first_batch$df.tickers$ticker,first_batch$df.tickers$price.close)
names(stock_prices)[names(stock_prices) == "first_batch.df.tickers.ref.date"] <- "ds"
names(stock_prices)[names(stock_prices) == "first_batch.df.tickers.ticker"] <- "group"
names(stock_prices)[names(stock_prices) == "first_batch.df.tickers.price.close"] <- "y"

head(stock_prices)

          ds group     y
1 2017-10-17     R 79.80
2 2017-10-18     R 81.66
3 2017-10-19     R 82.77
4 2017-10-20     R 83.65
5 2017-10-23     R 82.94
6 2017-10-24     R 81.90

When I predict the next 14 days of this stock I get the following:

library(dplyr)
library(prophet)
R_filtered<-stock_prices[,c(1,3)]
m <- prophet(R_filtered, daily.seasonality = TRUE)
future <- make_future_dataframe(m, periods = 14)
forecast <- predict(m, future)
plot(m, forecast)

enter image description here

Now using the sentiment data, I want to add that extra granularity to improve the model. I have daily sentiment data and the data was fluctuating too much so I took a 7-day moving average like below:

You can find the data in my github below:

library(RCurl)
x <- getURL("https://raw.githubusercontent.com/nknauer/sentiment_data/master/sentiment_data.csv")
sentiment <- read.csv(text = x)
head(sentiment)

        date business
1 2017-10-05 3.236427
2 2017-10-06 4.096874
3 2017-10-07 3.509310
4 2017-10-08 1.301071
5 2017-10-09 4.342805
6 2017-10-10 2.997384

##Changing to 7-day moving average to not improve outlier results:
library(TTR)
library(zoo)
sentiment$business_ma<- runMean(sentiment$business,7) 

Now to include this in my model, I would go about it this way:

sentiment <- 
  sentiment %>%
  dplyr::rename(ds = date,
                y = business_ma)
prophet_format<-prophet(sentiment, seasonality.mode = 'additive')
future <- make_future_dataframe(prophet_format, periods = 14, freq = 'day')
forecast_sentiment <- predict(prophet_format, future)
tail(forecast_sentiment %>% dplyr::select(ds, yhat_lower, yhat, yhat_upper))

             ds yhat_lower     yhat yhat_upper
1021 2020-07-21   1.401526 1.854315   2.318571
1022 2020-07-22   1.427068 1.845199   2.264391
1023 2020-07-23   1.392751 1.832939   2.269049
1024 2020-07-24   1.391243 1.819863   2.212086
1025 2020-07-25   1.355987 1.807671   2.247593
1026 2020-07-26   1.319377 1.793466   2.234582

plot(prophet_format, forecast_sentiment[c('ds','yhat', 'yhat_lower', 'yhat_upper')], 
 xlab = 'Daily Time Series',
 ylab = 'Business Seasonality (with 7 day-moving average)')

Plot of sentiment only prediction below with 7-day moving average:

enter image description here

Now when I include this with the stock data, the predictions fluctuate a ton.

df <- dplyr::left_join(stock_prices, 
                       data.frame(ds = as.Date(forecast_sentiment$ds), 
                                  sentiment = forecast_sentiment$yhat), 
                       by = c('ds'='ds'))
# replace NA withi 0
df[is.na(df)] <- 0

# view data
head(df)

mod.3 <- prophet(seasonality.mode = 'additive')
mod.3 <- add_regressor(mod.3, 'sentiment')

mod.3 <- fit.prophet(mod.3, df)

future.3 <- make_future_dataframe(mod.3, periods = 14, freq = 'day')

future.3  <- dplyr::left_join(future.3, 
                              data.frame(ds = as.Date(forecast_sentiment$ds), 
                                         sentiment = forecast_sentiment$yhat), 
                              by = c('ds'='ds')) ## %>%

future.3[is.na(future.3)] <- 0

tail(future.3,14)
# generate predictions using additional regressors
forecast_addReg <- predict(mod.3, future.3)

tail(forecast_addReg,14)

# plot forecast
plot(mod.3, forecast_addReg[c('ds','yhat', 'yhat_lower', 'yhat_upper')], 
     xlab = 'Monthly Time Series',
     ylab = 'HPI non-seasonally adj.')

enter image description here

Has anyone had this issue before using the prophet package with additional regressors? If you're interested this is the output of the model. The weekly seasonality is the big reason I think this happening but I'm confused that the numbers would fluctuate so much in a prediction when the underlying data is pretty stable.

tail(forecast_addReg,14)

enter image description here

bletham commented 4 years ago

OK I believe I know what's happening here. Thanks for the really detailed analysis and plots.

The issue is the terrible weekly seasonality prediction as you noted. You'll be able to see this if you look at the components plot:

prophet_plot_components(mod.3, forecast_addReg)

there are the two really negative values, for Sat and Sun.

What's happening here is that the stock time series only has data for M-F. This means that the weekly seasonality for Sat and Sun is completely unconstrained, and sometimes can fit very badly. There's some discussion of this in https://facebook.github.io/prophet/docs/non-daily_data.html#data-with-regular-gaps. You don't see this in the model fit to the history, because there the model is only being evaluated at points that have data, which means it is only being evaluated on M-F dates. future.3, on the other hand, contains every day, including Sat and Sun and so you see the really bad predictions for those dates.

The easiest fix, as described in the documentation link above, is to just not make predictions for Sat and Sun (just drop those dates from future). The model will work just fine for M-F, and so if you only ask for predictions on M-F everything will be correct.

The alternative would be to turn off weekly seasonality and replace it with 5 binary extra regressors that fit an effect for each of the days that you have data for, like is_Monday, is_Tuesday, etc. Then you wouldn't have this issue of the model fitting an effect for Sat and Sun where there isn't any data. I don't think there's any reason to prefer one approach over the other, and the first approach is probably less work..

nknauer commented 4 years ago

that did the trick, thank you! This can be closed now. Appreciate the help.