I think the the usage of predict_in_sample() in pmdarima(python pkg) is same as fitted() in forecast(R pkg).
But I find out that the output of predict_in_sample() in pmdarima(python pkg) is different with output of fitted() in forecast(R pkg) when difference order isn't zero.
I use the following python codes to generate the output of predict_in_sample() in three different differencing order (d) :
import numpy as np
from pmdarima.arima import ARIMA, auto_arima
for diff_ord in range(1,4):
model = ARIMA(order=(2,diff_ord,1), out_of_sample_size=0, mle_regression=True, suppress_warnings=True)
ori_time_series = np.array([0.49958017, 0.15162735, 0.86757565, 0.3093554, 0.20545085, -0.48288408, 0.6880291,
0.8461229, 0.8320223, -0.7372907, 0.6048833, 0.40874475, 0.57708055, 0.27590698,
-0.21213382, 0.4236031, 0.3324298, -0.076647766, -0.20372462, 0.93162024, 0.5740154])
model = model.fit(ori_time_series)
pred_in_sample = np.array(model.predict_in_sample())
print(f"pred_in_sample: {list(pred_in_sample)}")
I copy the output of above python codes, then paste to a R script to compare the difference between predict_in_sample() in pmdarima and fitted() in forecast(R pkg), R script:
par(mfrow=c(3,2))
resid_py_arr = rbind(c(0.5074603452679165, -0.34007254404457005, 0.5791635465131197, -0.2603422455105438, -0.11715725264909946, -0.9906800274596668, 0.21001659120388472, 0.3475615524863871, 0.7611416545018854, -0.8381194770281715, 0.2623376398458589, -0.23206647709890738, 0.4220053862092831, 0.06219916252952257, -0.4222181747488597, 0.038538708232362495, -0.08313995790478867, -0.26009526647620496, -0.48253573807639416, 0.5137650759426555, 0.34360142378578173),
c(0.49261470998182477, -0.6081910499091976, 1.0569346650372444, -0.6076380990855064, -0.2378976168316683, -0.8352597431282236, 1.0357942689078028, 0.631678563908118, 0.2767179073401864, -1.6808983058879592, 0.5653496038522947, 0.09552482041758414, 0.3659123662710821, -0.3491598976266268, -0.6745863302200333, 0.27064401578860775, 0.05909012652824919, -0.3734973292295873, -0.4340483262207756, 0.9099684919779161, 0.09388976669825833),
c(0.4940135127419584, -0.8586664744669461, 1.3118624972776918, -2.3436128660405227, -0.11312237780455078, -0.2567555216685367, 1.8351322442185336, 0.16659999993838948, -0.7634839186197411, -2.006759899752263, 1.8390440448281202, 0.3681746393692201, -0.30073609275686164, -0.6107051995729558, -0.5143875708651311, 0.8646807292504737, 0.03706843182642394, -0.7386026479980223, -0.1930232885395262, 1.3558665622526413, -0.579683995736004))
fitted_val_py_arr = rbind(c(-0.007880175267916512, 0.49169989404457004, 0.28841210348688023, 0.5696976455105438, 0.32260810264909945, 0.5077959474596667, 0.47801250879611523, 0.4985613475136129, 0.07088064549811457, 0.1008287770281715, 0.3425456601541411, 0.6408112270989074, 0.15507516379071695, 0.21370781747047746, 0.21008435474885973, 0.3850643917676375, 0.4155697579047887, 0.18344750047620495, 0.27881111807639414, 0.4178551640573445, 0.23041397621421822),
c(0.006965460018175224, 0.7598183999091976, -0.18935901503724428, 0.9169934990855064, 0.4433484668316683, 0.35237566312822355, -0.34776516890780296, 0.21444433609188196, 0.5553043926598136, 0.9436076058879592, 0.0395336961477053, 0.31321992958241585, 0.21116818372891794, 0.6250668776266268, 0.4624525102200333, 0.15295908421139226, 0.2733396734717508, 0.2968495632295873, 0.23032370622077558, 0.02165174802208386, 0.48012563330174163),
c(0.005566657258041537, 1.0102938244669462, -0.44428684727769174, 2.6529682660405225, 0.3185732278045508, -0.22612855833146328, -1.1471031442185335, 0.6795229000616105, 1.595506218619741, 1.2694691997522631, -1.2341607448281202, 0.040570110630779865, 0.8778166427568617, 0.8866121795729558, 0.30225375086513107, -0.44107762925047367, 0.29536136817357606, 0.6619548819980223, -0.010701331460473806, -0.4242463222526414, 1.153699395736004))
y<-c(0.49958017, 0.15162735, 0.86757565, 0.3093554, 0.20545085, -0.48288408, 0.6880291,
0.8461229, 0.8320223, -0.7372907, 0.6048833, 0.40874475, 0.57708055, 0.27590698,
-0.21213382, 0.4236031, 0.3324298, -0.076647766, -0.20372462, 0.93162024, 0.5740154) # The statistical part of the question is understanding that the in-sample one-step-ahead forecasts of an ARIMA model are actually the fitted values of that model. In R, the method fitted applied on model output object normally returns the fitted values of the model. However, the method is not applicable to the output of function arima. There is a workaround: fitted values equal original values minus residuals. Residuals can be extracted from a fitted object using the method residuals (and that applies to the output of function arima).
for (dif_ord in seq(1:3)) {
# Better still, use the forecast package which does have a fitted method for outputs from Arima and auto.arima. – Rob Hyndman Feb 26, 2016 at 9:49
#install.packages('forecast')
library(forecast)
fit.model.2 <- Arima(y, order = c(2, dif_ord, 1))
resid_r_forecast_arima <- residuals(fit.model.2)
resid_py <- resid_py_arr[dif_ord,]
plot.ts(y, xaxp = c(0, 21, 21), ylim = c(-2,2))
axis(2, at = seq(-1.5, 1.5, 0.5), tck = 1, lty = 2, col = "grey", labels = NA) # Add horizontal grid
axis(1, at = 1:21, tck = 1, lty = 2, col = "grey", labels = NA) # Add vertical grid
lines(resid_r_forecast_arima, col=2, lty=2)
lines(resid_py, col=3, lty=3)
legend(13, -1, c("origin", "resid_r", "resid_py"), col=1:3, lty=1:3, cex=1, ncol=3, y.intersp=0, x.intersp=0, text.width=0.9)
mtext(paste("Check residual series trend for diff order:", dif_ord))
fitted_val_r_forecast_arima <- fitted(fit.model.2)
fitted_val_py <- fitted_val_py_arr[dif_ord,]
plot.ts(y, xaxp = c(0, 21, 21), ylim = c(-2,2))
axis(2, at = seq(-1.5, 1.5, 0.5), tck = 1, lty = 2, col = "grey", labels = NA) # Add horizontal grid
axis(1, at = 1:21, tck = 1, lty = 2, col = "grey", labels = NA) # Add vertical grid
lines(fitted_val_r_forecast_arima, col=2, lty=2)
lines(fitted_val_py, col=3, lty=3)
legend(13, -1, c("origin", "fitted_r", "fitted_py"), col=1:3, lty=1:3, cex=1, ncol=3, y.intersp=0, x.intersp=0, text.width=1)
mtext(paste("Check fitted series trend for diff order:", dif_ord))
}
output image of R script:
I observe that: for the first d(difference order) term has huge difference, then they getting close after d(difference order) term.
And I think the output of forecast(R package) is correct., based on:
Describe the question you have
I think the the usage of
predict_in_sample()
inpmdarima
(python pkg) is same asfitted()
inforecast
(R pkg).But I find out that the output of
predict_in_sample()
inpmdarima
(python pkg) is different with output offitted()
inforecast
(R pkg) when difference order isn't zero.I use the following python codes to generate the output of
predict_in_sample()
in three different differencing order (d
) :I copy the output of above python codes, then paste to a R script to compare the difference between
predict_in_sample()
inpmdarima
andfitted()
inforecast
(R pkg), R script:output image of R script:![image](https://user-images.githubusercontent.com/5170730/208290503-64e36ca5-5cbf-46a3-bc9f-cfe0b09e4654.png)
d
(difference order) term has huge difference, then they getting close afterd
(difference order) term.forecast
(R package) is correct., based on:d
term of original time-series(black line) and output offitted()
(red line)My questions are:
predict_in_sample()
inpmdarima
to get the same output withfitted()
inforecast
?start
, but it only shorten the length of output ofpredict_in_sample()
.Versions (if necessary)
No response