Closed munitech4u closed 6 years ago
I think this is more a case of misunderstanding how ARIMAs work. Here's some (hopefully) helpful examples using Pyramid v0.8.1:
import pyramid as pm
import pandas as pd
# Read the table
X = pd.read_excel('/path/to/item_sales_daily.xlsx')
# Get the sales data, set date index for plotting
y = X['sales'].values
We can now look at the autocorrelation:
>>> pm.autocorr_plot(y)
Notice what appears to be an annual seasonal trend. If you see the documentation's section on understanding seasonal periodicity (m), you can probably reason your way into a reasonable m
setting. Since it's daily data with an annual trend, you might be looking at an m
of 365, but you know your data better than I do, so I'm not going to tell you that's the correct answer.
You set d=0
for some reason. Do you have reason to believe your data is already stationary? Because it's definitely not. Here's how you can estimate the d
parameter (again, this is an estimate):
>>> pm.arima.ndiffs(y, test='kpss', max_d=5)
1
This is performing a Kwiatkowski–Phillips–Schmidt–Shin (KPSS) test in the background to determine the number of differences which adequately makes your time series stationary. 1 seems to do the trick. Tests also available are test='pp'
(Phillips–Perron) and test='adf'
(Augmented Dickey-Fuller). See the documentation's section on enforcing stationarity and the API ref for more detailed information on these tests.
We can perform a similar estimate of D
using a Canova-Hansen test for seasonal differencing. Assuming m=365
(which, again, is my own uneducated guess):
>>> pm.arima.nsdiffs(y, m=365, max_D=5)
3
Then now you've learned several things:
seasonal=True
because your TS is definitely seasonally trendedm
) of your datad=1
and D=3
(or to whatever nsdiffs
returns pending your knowledge of m
)That's a starting point. Since this is such a data-related question, I can't solve the whole thing for you, but hopefully that gives you a jumping-off point.
Sorry, my bad. I didn't notice d=0 (I thought it tried to also find d). however setting d=1 and D=3 results in below error:
ValueError: if explicitly defined, d & D must be <= max_d & <= max_D, respectively
Really appreciate your effort in putting out detailed answer!! Cheers!
Never mind, The error got rectified after adding max_D parameter (by default it is 2)
Yeah the ValueError
is a bit of silly over-validation on my end. That's been fixed in v0.9.0 (not yet released)
Description
The predictions from Auto-Arima for a daily data are almost same average value. Is there anything I am doing wrong
Steps/Code to Reproduce
item_sales_daily.xlsx
Expected Results
Not very similar predictionss
Actual Results
Similar results for extended period of time. Values are almost same as 73,74,75. There is not trend capture. Not sure, if I am doing it correctly
Versions
Windows-10-10.0.15063 ('Python', '2.7.15 |Anaconda, Inc.| (default, May 1 2018, 18:37:09) [MSC v.1500 64 bit (AMD64)]') ('Pyramid', '0.7.1') ('NumPy', '1.14.3') ('SciPy', '1.1.0') ('Scikit-Learn', '0.19.1') ('Statsmodels', '0.9.0'))