Closed iagochavarry closed 3 years ago
It can totally be, I ran many times and most of the time it worked.
julia> versioninfo()
Julia Version 1.6.0
Commit f9720dc2eb (2021-03-24 12:55 UTC)
Platform Info:
OS: macOS (x86_64-apple-darwin19.6.0)
CPU: Intel(R) Core(TM) i5-7360U CPU @ 2.30GHz
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-11.0.1 (ORCJIT, skylake)
julia> n = 1000;
julia> e = 0.3*randn(n);
julia> dy = zeros(n);
julia> dy[1:2] = [33, 33];
julia> for i = 3:n
dy[i] = 1 + 0.97*dy[i-1] + e[i]
end
julia> model = StateSpaceModels.SARIMA(dy, order = (1,0,0), include_mean = true);
julia> StateSpaceModels.fit!(model);
julia> StateSpaceModels.results(model)
Results
===============================================================
Model: SARIMA(1, 0, 0)x(0, 0, 0, 0) with non-zero mean
Number of observations: 1000
Number of unknown parameters: 3
Log-likelihood: -217.1154
AIC: 440.2308
AICc: 440.2549
BIC: 454.9540
---------------------------------------------------------------
Parameter Estimate Std.Error z stat p-value
ar_L1 0.9599 157.9692 0.0061 0.9952
mean 33.5612 53.5327 0.6269 0.5384
sigma2_η 0.0901 0.0451 1.9983 0.0000
julia> using RCall
julia> @rput dy;
julia> R"""
library(forecast)
print(arima(dy, order = c(1,0,0)), include.mean = TRUE)
"""
Call:
arima(x = dy, order = c(1, 0, 0))
Coefficients:
ar1 intercept
0.9598 33.5781
s.e. 0.0087 0.2308
sigma^2 estimated as 0.09016: log likelihood = -217.12, aic = 440.24
RObject{VecSxp}
Call:
arima(x = dy, order = c(1, 0, 0))
Coefficients:
ar1 intercept
0.9598 33.5781
s.e. 0.0087 0.2308
sigma^2 estimated as 0.09016: log likelihood = -217.12, aic = 440.24
The std error is totally wrong and related to #193
In one specific run my results were
julia> n = 1000;
julia> e = 0.3*randn(n);
julia> dy = zeros(n);
julia> dy[1:2] = [33, 33];
julia> for i = 3:n
dy[i] = 1 + 0.97*dy[i-1] + e[i]
end
julia> model = StateSpaceModels.SARIMA(dy, order = (1,0,0), include_mean = true);
julia> StateSpaceModels.fit!(model);
julia> StateSpaceModels.results(model)
Results
===============================================================
Model: SARIMA(1, 0, 0)x(0, 0, 0, 0) with non-zero mean
Number of observations: 1000
Number of unknown parameters: 3
Log-likelihood: -210.1984
AIC: 426.3968
AICc: 426.4209
BIC: 441.1201
---------------------------------------------------------------
Parameter Estimate Std.Error z stat p-value
ar_L1 1.0000 -1.620E+07 -0.0000 -
mean 33.9465 1.517E+06 0.0000 1.0000
sigma2_η 0.0883 0.0441 2.0034 0.0000
julia> using RCall
julia> @rput dy;
julia> R"""
library(forecast)
print(arima(dy, order = c(1,0,0)), include.mean = TRUE)
"""
Call:
arima(x = dy, order = c(1, 0, 0))
Coefficients:
ar1 intercept
0.9653 33.9070
s.e. 0.0081 0.2617
sigma^2 estimated as 0.08682: log likelihood = -198.31, aic = 402.61
RObject{VecSxp}
Call:
arima(x = dy, order = c(1, 0, 0))
Coefficients:
ar1 intercept
0.9653 33.9070
s.e. 0.0081 0.2617
sigma^2 estimated as 0.08682: log likelihood = -198.31, aic = 402.61
and changing the initial value from the default to 0.9 solved the issue
julia> model = StateSpaceModels.SARIMA(dy, order = (1,0,0), include_mean = true);
julia> StateSpaceModels.set_initial_hyperparameters!(model, Dict("ar_L1"=> 0.9))
SARIMA(1, 0, 0)x(0, 0, 0, 0) with non-zero mean model
julia> StateSpaceModels.fit!(model);
julia> StateSpaceModels.results(model)
Results
===============================================================
Model: SARIMA(1, 0, 0)x(0, 0, 0, 0) with non-zero mean
Number of observations: 1000
Number of unknown parameters: 3
Log-likelihood: -198.3054
AIC: 402.6108
AICc: 402.6349
BIC: 417.3340
---------------------------------------------------------------
Parameter Estimate Std.Error z stat p-value
ar_L1 0.9654 207.6223 0.0046 0.9963
mean 33.9066 68.7551 0.4932 0.6270
sigma2_η 0.0869 0.0434 2.0016 0.0000
as well as changing the optimizer, maybe the default one (LBFGS) is not the best
julia> using Optim
julia> model = StateSpaceModels.SARIMA(dy, order = (1,0,0), include_mean = true);
julia> StateSpaceModels.fit!(model; optimizer=StateSpaceModels.Optimizer(Optim.BFGS()));
julia> StateSpaceModels.results(model)
Results
===============================================================
Model: SARIMA(1, 0, 0)x(0, 0, 0, 0) with non-zero mean
Number of observations: 1000
Number of unknown parameters: 3
Log-likelihood: -198.3054
AIC: 402.6108
AICc: 402.6349
BIC: 417.3340
---------------------------------------------------------------
Parameter Estimate Std.Error z stat p-value
ar_L1 0.9654 207.6219 0.0046 0.9963
mean 33.9066 68.7550 0.4932 0.6270
sigma2_η 0.0869 0.0434 2.0016 0.0000
Further analyzing the SARIMA model estimation, we arrived at some useful insights:
When the R complains of non-stationarity in the Conditional Sum of Squares (CSS) estimation, the StateSpaceModels also complain. But the reverse isn't true (mostly in high orders).
The StateSpaceModels can present some errors in the optimization procedure, that doesn't depend on the CSS result. We notice this in:
relax_unit_root_constraint
: sqrt
function receiving a negative value due to diag(y)
greater than 1log
function receiving a negative valueObs: The R package doesn't let the estimation continue when de CSS estimation complains about non-stationarity
Some benchmark information
Using the StateSpaceModels
in Julia
julia> versioninfo()
T_PARAMETERS.samples = 100;
@benchmark StateSpaceMJulia Version 1.0.5
Commit 3af96bcefc (2019-09-09 19:06 UTC)
Platform Info:odels.fit!(model)
OS: macOS (x86_64-apple-darwin18.6.0)
CPU: Intel(R) Core(TM) i5-5257U CPU @ 2.70GHz
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-6.0.0 (ORCJIT, broadwell)
julia> dy = CSV.read("dy.csv", DataFrame)[:,1];
julia> model = StateSpaceModels.SARIMA(dy, order = (2,0,2),include_mean = true)
SARIMA(2, 0, 2)x(0, 0, 0, 0) with non-zero mean model
julia> BenchmarkTools.DEFAULT_PARAMETERS.samples = 100;
julia> @benchmark StateSpaceModels.fit!(model)
BenchmarkTools.Trial:
memory estimate: 3.47 MiB
allocs estimate: 44412
--------------
minimum time: 49.565 ms (0.00% GC)
median time: 52.282 ms (0.00% GC)
mean time: 52.945 ms (1.43% GC)
maximum time: 62.411 ms (8.40% GC)
--------------
samples: 95
evals/sample: 1
Using the forecast
package in R
> library(rbenchmark)
> library(forecast)
>
> dy = read.csv(file = 'dy.csv')
> benchmark(Arima(dy, order = c(2,0,2), include.mean = TRUE), replications = 100)
test replications elapsed
1 Arima(dy, order = c(2, 0, 2), include.mean = TRUE) 100 4.378
relative user.self sys.self user.child sys.child
1 1 4.156 0.075 0 0
And finally, with the statsmodels
in Python
>>> import pandas as pd
>>> import statsmodels.api as sm
>>> import time
>>> from time import time
>>> import timeit
>>>
>>> dy = pd.read_csv("dy.csv");
>>>
>>> mod = sm.tsa.statespace.SARIMAX(dy, trend='c', order=(2, 0, 2));
def arimafit():
mod.fit(disp=False)
timeit.timeit(s>>>
>>> def arimafit():
... mod.fit(disp=False)
...
>>> timeit.timeit(stmt='arimafit()', setup='from __main__ import arimafit', number=100)
34.337730448
In summary, the average time is:
StateSpaceModels
: 52.945 ms
forecast
: 41.46 ms
statsmodels
: 343.37 ms
I was comparing the results in the SARIMA model estimation with StateSpaceModels and with the package forecast in R using a controlled time series ARIMA (1,0,0) -> AR_L1 = 0.97. The R package estimated correctly (0.974), and the StateSpaceModels estimated the AR-L1 coef as 0.999. May this be a convergency error?