facebook / prophet

Tool for producing high quality forecasts for time series data that has multiple seasonality with linear or non-linear growth.
https://facebook.github.io/prophet
MIT License
18.38k stars 4.52k forks source link

Holidays/ seasonalities plots #646

Closed Ramin1368 closed 6 years ago

Ramin1368 commented 6 years ago

Hi

I am a newbie in this so I sincerely apologize if my questions seem too basic and would appreciate if you answer some of my questions:

First thing first, when we say holiday, do we mean that we should expect a spike on that day or it can be a valley on that holiday? Also, does a holiday have to be recurring? That is, we have a holiday in a year (as a one-time holiday) and we do not have that the next year? I thought that holidays are essentially outliers that probably recur, is it okay if we put all the holidays as outliers? what would happen? what is the difference between holidays and outliers in prophet? Secondly, about the seasonality plots, I know that if a Monday shows to be 0.2 in a weekly seasonality plot, it means that it adds 0.2 to the trend every Monday of every week, but how do I know if there is any weekly seasonality or not just by having the weekly seasonality plot?? That plot simply shows one week?? Thirdly, if I wanted to find the trend changepoints manually, what should I do? Because just by visualizing it in Matplotlib, it gets very congested and so difficult to see if there is any level shift in the data? or if there is any change in the trend, is it linear or not??

I would appreciate your insight and help Kind Regards R.S.

bletham commented 6 years ago

Holidays could be either increases or decreases. They do not necessarily have to be recurring. If it is recurring, then the same effect (increase or decrease) will be applied to every instance. You can have a one-time holiday, which would effectively allow the model to fit any one-time spikes/dips without affecting the future at all.

You can handle outliers in Prophet by setting them each as one-time holidays. This would allow the model to fit their value without messing up the estimates of nearby dates / future dates. However, if you have identified outliers, it is better to remove them from the historical data then to fit them as one-time holidays. That is because every holiday introduces an additional parameter into the model, and so fitting and predicting will become slower with no benefit over just removing the points.

The weekly seasonality plot only shows one week because it is the same every week. You can tell how significant the effect is by comparing that 0.2 to the trend value, or you could just look at the total plot (m.plot) and you will be able to see how large the weekly oscillation is.

As for identifying changepoints manually, I'm not sure how much advice I can give. If you plot it and it isn't clearly visible where the trend is changing, then you'd probably be fine just leaving with the built-in default, which is to specify a large number of changepoints and then use the fitting to identify which ones should actually be used.

Ramin1368 commented 6 years ago

Hi

Thanks so much for your informative response. Based on your response on holidays, if I have a set of recurring holidays, should I set them as a separate holiday dataframe and the one-time holidays all together put in another holiday dataframe, and finally the two be concatenated as was done in the documentation for Playoff and Superbowl? OR first, I make a dataframe of all recurring holidays and then for every single one-time holiday, create a dataframe and then concatenate all these. Which one should I do?

Thanks again R.S.

bletham commented 6 years ago

That doesn't matter. Ultimately they all need to go in a single dataframe, and they could even just all be put in the same dataframe to start. In the documentation it was just more convenient to do it in two dataframes because then pandas will automatically replicate the 'holiday' column to have the same value for every entry in the 'ds' column. But that's just what's easier with pandas, it will make no difference to fbprophet.

Ramin1368 commented 6 years ago

Thanks again. regarding the seasonality plots, you mentioned that the significance of weekly seasonality on a Monday can be compared with the trend value. But what is the command in Python to show the trend plot based on day of the week? because for example, in the documentation example, the trend plot is from 2008 to 2017 just showing years not day of the week. How can we see the value of the trend for day of the week?

Other than that, I see lots of spikes and valleys that I know are not holidays. Can I label them as outliers? Because my little understanding says any spike or valley is either holiday, or outlier or a rare event?

Thanks R.S.

bletham commented 6 years ago

Well the trend changes across time and so it wouldn't be the same for various days of the week. So the amount of weekly seasonality relative to trend will change as the trend changes. (Unless you use multiplicative seasonality, in which case it will be a constant proportion of the trend).

I don't know that I'd say that any big spike or dip is necessarily an outlier. There is usually some natural variability in the time series. Points that are way outside the natural variance can be considered outliers, but not all outliers necessarily matter. Definitely look through the documentation page on this matter: https://facebook.github.io/prophet/docs/outliers.html . Outliers that are far enough out (or more realistically, when there is a clump of outliers) can mess up the forecast. These need to be removed. But if you look at the last figure on that page, you will see that the time series has a number of individual outlying points that aren't affecting the forecast in any way. There's really no need to remove these.

dsvrsec commented 6 years ago

How to know the number of significant changepoints used in the model when changepoints.prior.scale is increased or decreased?On what basis,the model picks up the changepoints

bletham commented 6 years ago

It's basically an L1 regularization applied to the change magnitudes. In fitting, changepoints will be used or not used (kept close to zero) by balancing the model fit (log likelihood) with the regularization term, similar to a Lasso regression. This is described in section 3.1.3 of the paper (https://peerj.com/preprints/3190.pdf). The actual number of significant changepoints (change values significantly different than 0) will depend both on changepoints.prior.scale and how many are needed to get good model fit, so it isn't possible in general to know how many there will be and it will depend on the data.

dsvrsec commented 6 years ago

Thanks you sir for the explanation.Then how can I know which changepoints.prior.scale value (or may be Fourier order)will give more accuracy ?Is there any method to do that.

As per my understanding,initially,the number of default changepoints are 25 evenly distributed,but that is not the real number that is considered for the model ,depending on the change point prior scale.Am I correct here?Thank you so much

bletham commented 6 years ago

All 25 are used in the model, but some will have 0 trend change associated with them due to the regularization. This is illustrated in https://facebook.github.io/prophet/docs/trend_changepoints.html. To determine the right value, you could try 3 or 5 of them and then use cross validation (https://facebook.github.io/prophet/docs/diagnostics.html) to see which one performs the best.

dsvrsec commented 6 years ago

Thanks @bletham .Can you suggest the range of changepoint prior scale and scale by which it ca be increased or decreased ,so that the model can give low error value.