facebook / prophet

Tool for producing high quality forecasts for time series data that has multiple seasonality with linear or non-linear growth.
https://facebook.github.io/prophet
MIT License
18.04k stars 4.48k forks source link

Prophet forecasting questions #24

Closed karlarao closed 7 years ago

karlarao commented 7 years ago

Hello Sean,

I have the following questions:

1) Would you be able to share or upload on the git notebooks folder the code you used for Figure 4 and 5 of your paper? https://facebookincubator.github.io/prophet/static/prophet_paper_20170113.pdf it will be a good reference and will give the users more familiarization on the usage of Prophet vs the other models

2) Sometimes forecasting is such a pain because let's say the forecast package only knows TS object format and then when XTS or ZOO is used all the underlying attributes are lost once the forecast function is called. In my example here https://github.com/karlarao/forecast_examples/blob/master/storage_forecast (see storage.R) I had to stick with using seq to generate the POSIXct time intervals and store it in a dataframe. I see in Prophet that you don't use any TS/XTS as well and I think you are handling your time series with the function prophet::make_future_dataframe. Did you have the same issues with TS/XTS objects? and do you still have to do a lot of massaging/pre-processing when you generate your time series data to feed to Prophet?

3) Can Prophet handle more fine grained data like minutes or seconds?

4) On performance evaluation I read the https://github.com/facebookincubator/prophet/blob/master/notebooks/uncertainty_intervals.ipynb showing uncertainty intervals. And of course the Figure 4 and 5 of your paper. Do you still do the other methods of evaluation/validation like the steps highlighted below?

forecast step by step: 
    eyeball the data
        raw data    
        data exploration
        periodicity
        ndiff (how much we should difference)
        decomposition - determine the series components (trend, seasonality etc.)
            x = decompose(AirPassengers, "additive")
            mymodel = x$trend + x$seasonal; plot(mymodel)           # just the trend and seasonal data
            mymodel2 = AirPassengers - x$seasonal ; plot(mymodel2)  # orig data minus the seasonal data
        seasonplot 
    process data
        create xts object
        create a ts object from xts (coredata, index, frequency/periodicity)
        partition data train,validation sets        
    graph it 
        tsoutliers (outlier detection) , anomaly detection (AnomalyDetection package)
        log scale data
        add trend line (moving average (centered - ma and trailing - rollmean) and simple exponential smoothing (ets))
   >performance evaluation
   >     Type of seasonality assessed graphically (decompose - additive,etc.)
   >     detrend and seasonal adjustment (smoothing/deseasonalizing)
   >     lag-1 diff graph
   >     forecast residual graph
   >     forecast error graph
   >     acf/pacf (Acf, tsdisplay)
   >         raw data
   >         forecast residual
   >         lag-1 diff
   >     autocorrelation 
   >         fUnitRoots::adfTest() - time series data is non-stationary (p value above 0.05)
   >         tsdisplay(diff(data_ts, lag=1)) - ACF displays there's no autocorrelation going on (no significant lags out of the 95% confidence interval, the blue line) 
   >     accuracy
   >     cross validation https://github.com/karlarao/forecast_examples/tree/master/cross_validation/cvts_tscvexample_investigation
   >     forecast of training
   >     forecast of training + validation + future (steps ahead)       
    forecast result
        display prediction intervals (forecast quantile)
        display the actual and forecasted series
        displaying the forecast errors
        distribution of forecast errors

5) I'm also curious how do you parallelize Prophet when you have to deal with many time series and bigger data sets? because I think under the hood it does monte carlo which is pretty compute intensive (please correct me if I'm wrong)

Thank you again for sharing Prophet!

-Karl

loneharoon commented 7 years ago

Another point: Can we use more than one time-series (as multiple features) for forecasting?

seanjtaylor commented 7 years ago

Would you be able to share or upload on the git notebooks folder the code you used for Figure 4 and 5 of your paper?

Unfortunately we don't have permission to share this data publicly, so the notebook wouldn't be runnable. I can work on getting the simulated forecast code into a notebook.

Did you have the same issues with TS/XTS objects? and do you still have to do a lot of massaging/pre-processing when you generate your time series data to feed to Prophet?

We don't really care about the exact type of object used for representing dates because Prophet does not rely on there being a regular sequence of dates. Essentially any type that 1) can be joined to the holidays list and 2) can be mapped to a numeric type is going to be ok. It's one of the advantages of using curve fitting instead of a traditional recursive time series model.

Can Prophet handle more fine grained data like minutes or seconds?

At this time I think you can fit on this kind of data but it won't learn anything from the additional granularity. We're going to add intra-day modeling to the v0.2 release.

Do you still do the other methods of evaluation/validation like the steps highlighted below?

No we tend to only evaluate forecasts for their intended purpose. For goal setting and planning that's often something like mean-absolute error at different forecast horizons (what we report in the paper). Many of the other evaluation procedures that come from traditional time series methods are not as useful in the context of curve fitting.

how do you parallelize Prophet when you have to deal with many time series and bigger data sets.

Typically we use a hash function that maps a time series into a number, e.g. in [0, K-1] and then run forecasts for the time series on K machines simultaneously using this mapping. One each individual machine we may just run a for-loop.

Can we use more than one time-series (as multiple features) for forecasting?

At this time, no. We don't accommodate multiple time series or covariates. We have some ideas about how to do this but haven't implemented them yet. It would be a nice way to contribute to the project to help us start the theoretical work on this.

khatwaniNikhil commented 7 years ago

Hello Sean, As per your last comment, Would you be able to take out time please and share the simulated forecast code for Figure 4 and 5 of your paper into a notebook? https://facebookincubator.github.io/prophet/static/prophet_paper_20170113.pdf Thanks

AhmedGS commented 6 years ago

For multiple features for forecasting I recommend using the R forecasting package with the xreg parameter.