Closed dhanashreearole closed 5 years ago
The black dots on the first plot show the actual y
values that you gave as the input data. They are the same as if you made a plot of df['ds']
vs. df['y']
.
The "yearly" vs. "Day of year" plot shows one cycle of the yearly seasonality: How much the yearly cycle goes above or below the baseline trend at each point in the year. For example, the peak in October says that every October the time series values are ~25% higher just due to the effect of the yearly seasonality. (Note, however that you have monthly data and so should carefully look at the section here about monthly data: https://facebook.github.io/prophet/docs/non-daily_data.html ).
Thanks for your reply Ben.
Is the light blue line line reflecting yhatupper? Is the dark blue line reflecting yhatlower?
After first iteration of prediction, would you advise eliminating outliers, for example capping the y values in input dataset to yhatupper or to yhatlower depending on where it is plotted on the chart? The records with y values in top area of the chart would be replaced by yhatupper.
I think it is not accurate to entirely eliminate the outliers because it will make the dataset discrete and choppy.
The dark blue line is yhat
. The light blue at the top is yhat_upper
, and the light blue at the bottom is yhat_lower
.
You can remove outliers if they are affecting the forecast, but the outliers here seem to be safely ignored so I wouldn't worry about them. See https://facebook.github.io/prophet/docs/outliers.html for examples of how outliers can mess up the forecast, and there's none of that here.
To be clear, I would not consider points that lie outside of yhat_lower
and yhat_upper
to be outliers: that is an 80% interval so we expect 20% of the data to lie outside, and those points are not outliers. Outliers would be points that are well outside the prediction interval, like the two points below 1.
Sounds great. Is there a way to make sure that when monthly frequency is used for predictions, then instead of starting at end of month, it can be adjusted to match with the pattern of historical data:
As you can see, tje input dataset has start of months that ends at index 223. From 224 onwards, prophet starts predicting the yhat, however it predicts it for August 31, September 31 so on and so forth. Some changes that were surfaced compared to daily prediction is that freq is set to M:
future = m.make_future_dataframe(periods=29, freq = 'M') #Create a data frame for the future dates
input_file_cv = cross_validation(m, horizon = '29 days')
Prophet will nicely predict it until end of 2020 with excellent accuracy, however I wish we could allow it to pass months string for horizon. I have stepped into forecaster.py and diagnostics.py to see if it can be adjusted, but no luck. Also the horizon parameter doesn't like months string?
Can you please suggest better way to handle it?
Dhanashree Arole Dhanashree Arole | Business Intelligence | AAA National Office | 1000 AAA Drive | Heathrow, FL 32746-5063 darole@national.aaa.commailto:darole@national.aaa.com | www.AAA.comhttp://www.aaa.com/ [cid:image001.png@01CF5E03.E9E00800]
From: Ben Letham notifications@github.com Sent: Wednesday, October 17, 2018 8:07 PM To: facebook/prophet prophet@noreply.github.com Cc: Arole, Dhanashree (Kolter) DArole@national.aaa.com; State change state_change@noreply.github.com Subject: Re: [facebook/prophet] Interpretation of plot function visual (#691)
The dark blue line is yhat. The light blue at the top is yhat_upper, and the light blue at the bottom is yhat_lower. You can remove outliers if they are affecting the forecast, but the outliers here seem to be safely ignored so I wouldn't worry about them. See https://facebook.github.io/prophet/docs/outliers.htmlhttps://urldefense.proofpoint.com/v2/url?u=https-3A__facebook.github.io_prophet_docs_outliers.html&d=DwMFaQ&c=rlZAUarxv0HOJXjDdf7mE9Es74rYvd5gG3lFJaIo-yg&r=3WRKkWAw3ra4jUDWD6vVxsOCc2zM1Jkf6Lk2r_aULgo&m=9bWaI01IMWkipJSmGDy8rK5MncPgKB2ve_TABbx8TMs&s=b2RK2S2w0k-BQIVI2LExCmTNZ6VQFLGp0Sxl2sSGTHk&e= for examples of how outliers can mess up the forecast, and there's none of that here.
To be clear, I would not consider points that lie outside of yhat_lower and yhat_upper to be outliers: that is an 80% interval so we expect 20% of the data to lie outside, and those points are not outliers. Outliers would be points that are well outside the prediction interval, like the two points below 1.
— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHubhttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_facebook_prophet_issues_691-23issuecomment-2D430830855&d=DwMFaQ&c=rlZAUarxv0HOJXjDdf7mE9Es74rYvd5gG3lFJaIo-yg&r=3WRKkWAw3ra4jUDWD6vVxsOCc2zM1Jkf6Lk2r_aULgo&m=9bWaI01IMWkipJSmGDy8rK5MncPgKB2ve_TABbx8TMs&s=rSqJTH0mrlI4vVdeM9K_RPtYC0dJKeSdGC_9LLvlFl4&e=, or mute the threadhttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_Apw7noiDeNzYivth7WQ48P750vDQpSQDks5ul8YWgaJpZM4XEuTb&d=DwMFaQ&c=rlZAUarxv0HOJXjDdf7mE9Es74rYvd5gG3lFJaIo-yg&r=3WRKkWAw3ra4jUDWD6vVxsOCc2zM1Jkf6Lk2r_aULgo&m=9bWaI01IMWkipJSmGDy8rK5MncPgKB2ve_TABbx8TMs&s=a2hGAWmqc_fNn-9DXQaqIfiXOTqH1uW6_UZDTaYUuYg&e=.
[AAA] Get the AAA Mobile app! [http://www.aaa.com/AAA/images/applebadge.png]http://www.aaa.com/configuration/SEM/AAAEmailMobileAppDownload.html?app=IOS[http://www.aaa.com/AAA/images/googlebadge.png]http://www.aaa.com/configuration/SEM/AAAEmailMobileAppDownload.html?app=ANDROID
AAA Disclaimer Communication This communication (including all attachments) is intended solely for the use of the person(s) to whom it is addressed and should be treated as a confidential AAA communication. If you are not the intended recipient, any use, distribution, printing, or copying of this email is strictly prohibited. If you received this email in error, please immediately delete it from your system and notify the originator. Your cooperation is appreciated.
The make_future_dataframe
uses pandas date_range to generate the dates, which supports these frequencies: https://pandas.pydata.org/pandas-docs/stable/timeseries.html#timeseries-offset-aliases
As you can see there, M
is month-end frequency. If you want month-start, it is MS
.
For cross validation, it uses pandas Timedelta which only supports days or smaller - see #650 for some more discussion on that. There is an open issue at #586 to better support monthly cross-validation, but in the meantime a horizon of 31 days
would do the trick.
That is perfect and very helpful. Thanks Ben so much!!!
While experimenting with monthly frequency and rolling aggregates, I realized that Prophet changes the forecast output.
Rolling1
The mean absolute percent error is excellent:
Rolling2
Mean Absolute Percent Error
Rolling 3
Would you happen to know what makes prophet change the membership counts for the month of July as shown in Forecast_Rolling1, Forecast_Rolling2, Forecast_Rolling3:
Is it any way indicator of accuracy (possibly not)?
This is my very first attempt to precisely predict monthly memberships with MS frequency. I have taken into consideration holiday effect with start and end dates for each month only.
Thanks in advance for your valuable time, effort and energy!
I don't fully understand what these numbers are in the spreadsheet. This is what I think was done, but please correct if I misunderstand:
m.plot(forecast)
and you will see that the historical data (black dots) do not perfectly match the prophet prediction (blue line).Yes, we are concurring on few issues, thanks for your valuable insights Ben!
I am in the middle of understanding Tukey Ladder of Power. Depending on the skewness, change the transformation. Simple question is that will Prophet work best if the data is as close as possible to being normally distributed?
It will work best if the variance around the main estimate (yhat
) is normally distributed, since that is assumed by the model. But if you just make a histogram of all of your data it could be very different from normal due to trends and seasonality.
Hello, I am using prophet model for forecasting the call volume for 30 days based on 3 years data. Would anyone happen to know how to interpret the black dots on the y vs. ds plot? More importantly, the yearly vs. Day of year plot in way too advanced for me to understand what it depicts? Please chime in and I will be glad to provide more details.
Thanks, D.A. Prophet Predictions Data Discovery Monthly.docx
Here is the script:
-- coding: utf-8 --
""" Spyder Editor
This is a temporary script file. """
def greetings(): """Print "Hello World" and return None""" ''' E-Business ''' print("Prophet Data Model")
main program starts here
greetings()
import configparser import pandas as pd import numpy as np from fbprophet import Prophet from fbprophet.diagnostics import cross_validation
cfg = configparser.RawConfigParser() cfgp = r'C:/Users/darole/.spyder-py3/scripts/config.txt' cfg.read(cfgp) config_changepoints = cfg.get('master-config', 'config_changepoints') config_changescale = cfg.get('master-config', 'config_changescale') x_dataframe = cfg.get('master-config', 'x_dataframe').split(',') our_dataframe = cfg.get('master-config', 'our_dataframe').split(',')
result_path = str(cfg.get('master-config', 'result_path')) result_name = str(cfg.get('master-config', 'result_name'))
input_file_m = pd.read_csv(result_name)
input_file_master = pd.read_csv(result_name)
input_file_direct = pd.read_csv('C:/Users/darole/.spyder-py3/scripts/pdx1_battery_may_14.csv')
input_file_master['y']= np.log(input_file_master['y']) # natural logarithm log base e
input_file_master.head()
hd = pd.DataFrame({ 'holiday': 'hd', 'ds': pd.to_datetime(our_dataframe), 'lower_window': 0, 'upper_window': 1, })
initialize Prophet
m = Prophet(holidays=hd, n_changepoints=int(config_changepoints), changepoint_prior_scale=float(config_changescale)) input_file_master['ds'] = pd.DatetimeIndex(input_file_master['ds']) #Index Data m.fit(input_file_master); #Fit the model
future = m.make_future_dataframe(periods=30) #Create a data frame for the future dates future.tail() # spot check forecast = m.predict(future) # make a prediction
This crossvalidation- can be useful for tuning parameters
input_file_cv = cross_validation(m, horizon = '60 days') input_file_cv.head()
holidays
forecast[(forecast['hd']).abs() > 0][['ds', 'hd']][-10:] forecast['y'] = pd.Series(input_file_master['y']) forecast['callvolo'] = pd.Series(input_file_m['y']) forecast['callvolf']= pd.Series( np.exp(forecast['yhat']))
trend = m.plot(forecast) # plots trend of yhat w.r.t. year yearly = m.plot_components(forecast) # plots percentage w.r.t. month
forecast.to_csv('C:/Users/darole/.spyder-py3/scripts/pdx1_battery_may_14_545_120_ourdataframe.csv')