alkaline-ml / pmdarima

A statistical library designed to fill the void in Python's time series analysis capabilities, including the equivalent of R's auto.arima function.
https://www.alkaline-ml.com/pmdarima
MIT License
1.57k stars 231 forks source link

Error in residual component of decomposed TS with daily observations and 7-days period #501

Open a-slice-of-py opened 2 years ago

a-slice-of-py commented 2 years ago

Describe the bug

I am working with a time series made of daily observations which shows a 7-days period with a drop on Sundays (and holidays, more in general) and I the set period to 7 days.

While exploring my data via pm.utils.decomposed_plot, I found out that residual component of the decomposed time series still shows a (not catched) seasonal pattern. Playing a bit with the period, it turned out the periods in the form of m=14*k with k=1, 2, ... actually return "expected" residuals shape.

In the below example, residuals computed with pm.arima.decompose and period=7 have a norm2 which is ~6x the one obtained with period=14. In contrast both residuals computed (as benchmark) via statsmodels.tsa.seasonal result similar to each other.

To Reproduce

import numpy as np
import pmdarima as pm
from statsmodels.tsa.seasonal import seasonal_decompose

ts = np.array([
    31.6650730440813, 30.831972346964566, 31.027745593230318, 31.29000592980551, 30.06322951321806, 21.776102502675606, -4.01229996340556, 
    33.80363444278149, 32.2117201530302, 31.413786475801185, 31.69155017047589, 31.584155726085857, 21.671124788889156, -4.01229996340556, 
    33.08452589032317, 31.929265539973485, 30.562178423295958, 32.437953355241234, 31.434096762281694, 22.05361521523259, -4.01229996340556, 
    33.25616015096144, 32.923233622141, 32.19486219938412, 32.327997317973605, 31.220133312026178, 21.523749050299717, -4.01229996340556, 
    33.88336118443987, 32.06549995200903, 31.444238175590346, 30.513296928660562, 30.093820663286973, 20.00873736055127, -4.01229996340556, 
    31.93897360628588, 31.367386756058472, 30.41611062977666, 30.041508341519144, 29.18586661387992, 18.947963856074345, -4.01229996340556, 
    32.21803551937539, 30.303080306579204, 30.014615926164126, 29.829687059099104, 30.10018295569891, 20.552810449124998, -4.01229996340556, 
    34.43765590094934, 31.720164107773762, 30.753163115602373, 29.770829027661737, 29.963206334446873, 19.7987766397603, -4.01229996340556, 
    31.020729247923473, 29.64019255525462, 29.76295684827037, 30.150947273922043, 29.460787646083045, 19.574711853067132, -4.01229996340556, 
    30.558519854976534, 28.856503735579487, 28.787641855065033, 27.78308015283352, 25.805761298889554, -4.01229996340556, -4.01229996340556, 
    30.79144384348109, 30.091274699493404, 29.95676288484756, 29.379466108247165, 28.761700892846005, 18.53885715320011, -4.01229996340556, 
    31.71906490558641, 31.4543704610565, 31.08488080280646, 30.300582922796725, 29.983799555351, 20.386631531296917, -4.01229996340556, 
    33.243638761790464, 32.57037372205673, 30.140813407364256, 28.73280157469541, 27.893596397850242, 18.119903449514474, -4.01229996340556, 
    30.078535893849953, 28.758814570010895, 28.164735253098282, 27.57456771024279, 26.867761716462937, 18.473080153514154, -4.01229996340556, 
    29.360401069776668, 29.06461489619925, 28.175432216467744, 27.082934191997218, 27.42105292707107, 18.604036799215987, -4.01229996340556, 
    26.70809523270929, 30.67499413672734, 29.654786713797623, 29.72746034651609, 29.911550605749316, 20.609792054476156, -4.01229996340556, 
    31.367386756058472, 31.249968416383116, 31.68714160980542, 30.450655207861885, 29.572266963087923, 19.683574615080015, -4.01229996340556, 
    31.828461812751577, 30.958558462100406, 30.75915323319262, 30.21910452088912, 29.339936113764743, 19.977994922194544, -4.01229996340556, 
    31.756378179730852, 28.8679366568758, 31.559655700130183, 31.62524093380048, 30.69428721778141, 21.705235774185468, -4.01229996340556, 
    33.156586464245, 31.584155726085857, 31.07208002345255, 30.190126862651006, 29.45808693548714, 20.283710648700765, -4.01229996340556, 
    33.451883191952355, 32.92620407163098, 32.98250017711902, 31.327625040719685, 31.832815867067115, 22.23552147740288, -4.01229996340556, 
    33.29268534529332, 32.23170692990565, 30.168658795659656, 29.502561191577954, 29.29340193142716, 20.298497672642263, -4.01229996340556, 
    31.94436341050769, 30.2680647210135, 30.135742921136696, 30.492456131398725, 29.05620095365852, 20.912598875006225, -4.01229996340556, 
    32.42659134181745, 31.36965460711737, 31.14172456769391, 31.033589169451258, 31.15328983743314, 21.621253055681752, -4.01229996340556, 
    33.390173519748984, 33.21372644694623, 34.09970236529061, 33.690933921643584, 33.60617408555065, 23.668899779155105, -4.01229996340556, 
    34.768200373181166, 33.06790899258198, 32.33633250769162, 31.403617564304835, 30.932649170348743, 21.652458588201977, -4.01229996340556, 
    32.65883515127049, -4.01229996340556, 33.411095772353434, 32.31652659003101, 31.96912379149358, 22.963501252067367, -4.01229996340556, 
    34.16501645709075, 32.61112924530656, 32.478147222718825, 31.850215446923936, 31.483590407990626, 22.888614149448394, -4.01229996340556
])

decomposition_type = 'additive'
periods = (7, 14)
residuals = {}

for period in periods:
    print(f'Period: {period} days')
    # Residuals computed via statsmodels are more or less 
    # the same with period=7 or period=14
    statsmodels_decomposed = seasonal_decompose(ts, model=decomposition_type, period=period);
    statsmodels_decomposed.plot();
    residuals[f'statsmodels_{period}'] = statsmodels_decomposed.resid
    # In contrast, residuals computed via pmdarima differ 
    # significantly with period=7 or period=14
    pmdarima_decomposed = pm.arima.decompose(ts, type_=decomposition_type, m=period)
    pm.utils.decomposed_plot(pmdarima_decomposed, figure_kwargs={}, show=True);
    residuals[f'pmdarima_{period}'] = pmdarima_decomposed.random

# Print norm2 of computed residuals
for k, v in residuals.items():
    norm2 = np.sqrt(np.nansum(np.square(v)))
    print(f'{k.ljust(len(max(residuals))+1)}: {norm2:.2f}')

Output

Period: 7 days image

Period: 14 days image

Norm2 of computed residuals statsmodels_7 : 40.51 pmdarima_7 : 235.11 statsmodels_14: 41.13 pmdarima_14 : 41.16

Versions

python 3.9
numpy 1.21.5
pmdarima 1.8.5
statsmodels 0.13.2

Expected Behavior

With such a time series I expected that residuals computed with both 7 and 14 days periods are similar to each other, as the ones returned by statsmodels.tsa.seasonal.

Actual Behavior

In the attached sample, residuals computed with pm.arima.decompose and period=7 still show an uncatched pattern and their norm2 is ~6x the one obtained with period=14. In contrast both residuals computed (as benchmark) via statsmodels.tsa.seasonal result similar to each other.

Additional Context

First of all, thanks for such an amazing library!

I am working with a time series made of daily observations which shows a 7-days period with a drop on Sundays (and holidays, more in general).

image

Following tips in the docs about setting period parameter as well as the blog post by Rob J Hyndman, I set m=7.

While exploring my data via pm.utils.decomposed_plot, I found out that residual component of the decomposed time series still shows a (not catched) seasonal pattern. Playing a bit with the period, it turned out the periods in the form of m=14*k with k=1, 2, ... actually return "expected" residuals shape.

Since the supposed problem disappears by doubling the natural period, which is an odd number, i.e. making it even, I inspected pm.arima.decompose source code identifying the cause at line 136 which make this swap:

seasonal = np.array(seasonal[half_m:] + seasonal[:half_m])

In contrast, I wasn't able to locate anything similar into statsmodels.tsa.seasonal source code, and residuals computed with period=7 and period=14 by statsmodels seem to be indeed coherent with each other (see the above snippet for reproducibility).

Is this a bug? Am I missing something?