facebook / prophet

Tool for producing high quality forecasts for time series data that has multiple seasonality with linear or non-linear growth.
https://facebook.github.io/prophet
MIT License
18.39k stars 4.53k forks source link

How to predict more accurate? #1346

Closed ghuname closed 4 years ago

ghuname commented 4 years ago

I have a dataframe that contains 15 minutes time series:

import requests
import json
import pandas as pd
import datetime as dt

text = requests.get('https://pastebin.com/raw/6CjHd4ca').text
df = pd.DataFrame(json.loads(text[1:-1]))
df.ds = pd.to_datetime(df.ds, unit='ms')
df

>>> df
                      ds      y
0    2020-01-31 23:00:00    249
1    2020-01-31 23:15:00    286
2    2020-01-31 23:30:00    283
3    2020-01-31 23:45:00    333
4    2020-02-01 00:00:00   7764
...                  ...    ...
1953 2020-02-21 07:15:00  14163
1954 2020-02-21 07:30:00  15562
1955 2020-02-21 07:45:00  17268
1956 2020-02-21 08:00:00  22331
1957 2020-02-21 08:15:00   3347

[1958 rows x 2 columns]

When I plot it:

df.set_index('ds').y.plot(figsize=(15,5))

image

I can see that there is some sort of seasonality.

Now I have instanciated prophet with default options and do the prediction for next day (96 15 minutes intervals):

m = Prophet()
m.fit(df)

INFO:fbprophet:Disabling yearly seasonality. Run prophet with yearly_seasonality=True to override this.
C:\Users\zoran\Miniconda3\lib\site-packages\fbprophet\forecaster.py:400: FutureWarning:

The pandas.datetime class is deprecated and will be removed from pandas in a future version. Import from datetime module instead.

future = m.make_future_dataframe(periods=96, freq='15T')

fcst = m.predict(future)

When I plot original and forecasted data I get:

fig, axarr = plt.subplots(2,1, figsize=(15,10), sharex=False)
m.plot(fcst, ax = axarr[0])
df.set_index('ds').y.plot(ax = axarr[1], marker='o', markersize=5, color='black', linestyle='--', linewidth=0)

image

These two graphs are not aligned properly (sharex=True, doesn't work properly).

As I can see, on upper graph, black dots are original values, blue line is the forecast model, the light blue area is the confidence interval... By looking at blue line, I could say that it is not following blue dots so in this case prediction is not that good.

What can I do to make prediction more accurate? I played with model options (something like

m = Prophet(yearly_seasonality=False, weekly_seasonality=False, daily_seasonality=False, interval_width=0.99, seasonality_mode='additive', changepoint_range=0.8, changepoint_prior_scale=0.1)
m = Prophet(changepoint_prior_scale=0.1)

) but with no success.

If I try to plot components (m.plot_components(fcst)) I get an error AttributeError: 'DatetimeIndex' object has no attribute 'weekday_name'.

How to be more accurate with prophet?

benmwhite commented 4 years ago

You can try adding a custom seasonality effect with the appropriate period. It looks like your seasonality is some number of hours (30-40ish?), hopefully if you're familiar with the process generating the data you can supply a more exact number and use that in your custom seasonal effect.

ghuname commented 4 years ago

I have detected each apex on the graph. These are times:

    dtime   apex
0   2020-02-02 12:15:00 
1   2020-02-04 08:00:00 1 days 19:45:00.000000000
2   2020-02-05 11:00:00 1 days 03:00:00.000000000
3   2020-02-06 18:45:00 1 days 07:45:00.000000000
4   2020-02-08 17:30:00 1 days 22:45:00.000000000
5   2020-02-10 13:15:00 1 days 19:45:00.000000000
6   2020-02-12 07:45:00 1 days 18:30:00.000000000
7   2020-02-13 22:00:00 1 days 14:15:00.000000000
8   2020-02-15 19:30:00 1 days 21:30:00.000000000
9   2020-02-17 10:30:00 1 days 15:00:00.000000000
10  2020-02-18 17:45:00 1 days 07:15:00.000000000
11  2020-02-20 15:00:00 1 days 21:15:00.000000000

As you can see, apex is not regular, but on average it is Timedelta('1 days 15:31:21.818181') ~ 1.65 days.

Now I tried the same but I have incorporated seasonality (I am not sure about fourier_order parameter) as:

m = Prophet(weekly_seasonality=False)
m.add_seasonality(name='test', period=1.65, fourier_order=1)
m.fit(df)

future = m.make_future_dataframe(periods=96, freq='15T')
fcst = m.predict(future)

fig, axarr = plt.subplots(2,1, figsize=(15,10), sharex=False)
m.plot(fcst, ax = axarr[0])
df.set_index('ds').y.plot(ax = axarr[1], marker='o', markersize=5, color='black', linestyle='--', linewidth=0)

which produced the following graphs:

image

It is kind of better, but still not good enough for my use case. Is there anything I can do to make prediction more accurate?

benmwhite commented 4 years ago

You probably need a higher Fourier order to fit the seasonality a little tighter to the training data, see this page for details and some other links. If you're experimenting with different parameter values then it's helpful to check the component plots to see the effects on the estimated trend and seasonality.

To be honest though it's going to be really tough in any case if the peaks aren't strictly regular, I'm not really sure what the best approach actually is for this problem.

ghuname commented 4 years ago

I will check the paper on https://peerj.com/preprints/3190/. Cannot do it at the moment. As I can see, there is the detailed explanation about Fourier order.

In the meantime, can I use GridSearchCV for custom seasonality as well as other parameters? For example custom seasonality is added in second step:

m = Prophet(weekly_seasonality=False)
m.add_seasonality(name='monthly', period=30.5, fourier_order=5)

Is there a way to put name, period and fourier_order to GridSearchCV parameters?

Another question is how to measure accuracy_score of the model? If I understood correctly, I should compare original y value with yhat, something like accuracy_score(df_train.y, fcst.loc[:len(df_train)-1, 'yhat'].astype(int)) Is it true?

Maybe you are right about time series that are not having strictly regular peaks. Looks like you have a lot of experiences. Can you please tell me what kind of time series are the best for prophet? What is your experience?

benmwhite commented 4 years ago

Check out the this doc page for some of the built-in methods for validation. One thing to keep in mind is that you can't just randomly partition the training data like you would with independent observations due to the time component and dependency but you can let the Prophet cross-validation function take care of setting up the training and evaluation windows for you. This post has some examples of evaluating custom error metrics with the Prophet cross-validation outputs.

Also here's an issue thread and blog post with that discuss Prophet model hyperparameter selection with Python in particular.

ghuname commented 4 years ago

At the moment I am experimenting with a toy example of time series data. Please give to me few days.

HelioNeves commented 4 years ago

Try to add more fourier series (orders) in your seasonality, getting more precision but this may cause overfitting.

Test with fourier_order=1000 and don't worry about time, it takes around 15-30 min to train a model.