alkaline-ml / pmdarima

A statistical library designed to fill the void in Python's time series analysis capabilities, including the equivalent of R's auto.arima function.
https://www.alkaline-ml.com/pmdarima
MIT License
1.57k stars 231 forks source link

auto_arima ignores seasonality in data #507

Closed Suprema-tism closed 2 years ago

Suprema-tism commented 2 years ago

Describe the question you have

Hi,

I'm trying to understand why auto_arima is treating my fairly simple monthly ts as a random walk. It seems to me that the ocsb test fails to recognize that the seasonality is present. The thing is, R's auto.arima, which, as far as I can tell, uses the same statistical test for detecting seasonality, finds the correct order of seasonal differencing D = 1, giving a reasonable forecast.

Of course, I can force D = 1, but since I have almost a hundred of ts objects, it doesn't seem to be a good solution. I've started thinking that something is wrong with my code (totally possible); however, I can't wrap my head around it...

I would greatly appreciate any piece of advice!

image

model_DR = auto_arima(y = Y_train, start_p = 0, d = None, start_q = 0, # np.log(Y_train)
                          max_p = 4, max_d = 2, max_q = 4, 
                          start_P = 0, D = None, start_Q = 0, max_P = 3, max_D = 1, max_Q = 3,
                          max_order = None, start_params = None,
                          stepwise = True, maxiter = 50,
                          m = 12, seasonal = True, stationary = False, 
                          information_criterion = 'aic', 
                          alpha = 0.05, test = 'kpss', seasonal_test = 'ocsb',
                          with_intercept = 'auto', method = 'lbfgs',
                          suppress_warnings = True, error_action = 'ignore', trace = True)

pred = model_DR.predict(n_periods = 12, X = X_test, return_conf_int = False)
Performing stepwise search to minimize aic
 ARIMA(0,1,0)(0,0,0)[12] intercept   : AIC=589.286, Time=0.00 sec
 ARIMA(1,1,0)(1,0,0)[12] intercept   : AIC=589.673, Time=0.03 sec
 ARIMA(0,1,1)(0,0,1)[12] intercept   : AIC=589.159, Time=0.07 sec
 ARIMA(0,1,0)(0,0,0)[12]             : AIC=587.433, Time=0.00 sec
 ARIMA(0,1,0)(1,0,0)[12] intercept   : AIC=589.800, Time=0.05 sec
 ARIMA(0,1,0)(0,0,1)[12] intercept   : AIC=590.024, Time=0.02 sec
 ARIMA(0,1,0)(1,0,1)[12] intercept   : AIC=592.024, Time=0.04 sec
 ARIMA(1,1,0)(0,0,0)[12] intercept   : AIC=588.358, Time=0.01 sec
 ARIMA(0,1,1)(0,0,0)[12] intercept   : AIC=588.221, Time=0.01 sec
 ARIMA(1,1,1)(0,0,0)[12] intercept   : AIC=589.666, Time=0.04 sec

Best model:  ARIMA(0,1,0)(0,0,0)[12]          
Total fit time: 0.297 seconds

My data:

[13260.013671875, 12172.1552734375, 12773.0517578125, 12242.5107421875, 10904.9541015625, 12490.5703125, 13431.5439453125, 12095.5859375, 13451.1279296875, 12326.875, 14262.0966796875, 15084.7080078125, 13585.26953125, 11943.0009765625, 12201.677734375, 11474.94140625, 11776.224609375, 11827.71484375, 11338.091796875, 10388.357421875, 11479.623046875, 10575.59765625, 12218.193359375, 12985.5927734375, 11684.8583984375, 10268.5546875, 10416.9140625, 9804.447265625, 10088.771484375, 10171.2373046875, 9702.9267578125, 8717.962890625, 9754.8818359375, 8951.7724609375, 10289.7548828125, 10909.923828125]

Versions (if necessary)

No response

tgsmith61591 commented 2 years ago

R's auto.arima actually gives 0 for nsdiffs when you use the OCSB test for this data:

> library("forecast")
> x = c(13260.01367188, 12172.15527344, 12773.05175781, 12242.51074219,
+        10904.95410156, 12490.5703125 , 13431.54394531, 12095.5859375 ,
+        13451.12792969, 12326.875     , 14262.09667969, 15084.70800781,
+        13585.26953125, 11943.00097656, 12201.67773438, 11474.94140625,
+        11776.22460938, 11827.71484375, 11338.09179688, 10388.35742188,
+        11479.62304688, 10575.59765625, 12218.19335938, 12985.59277344,
+        11684.85839844, 10268.5546875 , 10416.9140625 ,  9804.44726562,
+        10088.77148438, 10171.23730469,  9702.92675781,  8717.96289062,
+         9754.88183594,  8951.77246094, 10289.75488281, 10909.92382812)
> x = ts(x, freq=12)

> forecast::nsdiffs(x, test="ocsb")
[1] 0

Seems it comes down to the fact that forecast now uses seas as its default seasonal test.

Suprema-tism commented 2 years ago

R's auto.arima actually gives 0 for nsdiffs when you use the OCSB test for this data:

> library("forecast")
> x = c(13260.01367188, 12172.15527344, 12773.05175781, 12242.51074219,
+        10904.95410156, 12490.5703125 , 13431.54394531, 12095.5859375 ,
+        13451.12792969, 12326.875     , 14262.09667969, 15084.70800781,
+        13585.26953125, 11943.00097656, 12201.67773438, 11474.94140625,
+        11776.22460938, 11827.71484375, 11338.09179688, 10388.35742188,
+        11479.62304688, 10575.59765625, 12218.19335938, 12985.59277344,
+        11684.85839844, 10268.5546875 , 10416.9140625 ,  9804.44726562,
+        10088.77148438, 10171.23730469,  9702.92675781,  8717.96289062,
+         9754.88183594,  8951.77246094, 10289.75488281, 10909.92382812)
> x = ts(x, freq=12)

> forecast::nsdiffs(x, test="ocsb")
[1] 0

Seems it comes down to the fact that forecast now uses seas as its default seasonal test.

Appreciate your help! I am so used to ocsb that I didn't even think that it could've been changed. Thanks a lot!