RJT1990 / pyflux

Open source time series library for Python
BSD 3-Clause "New" or "Revised" License
2.11k stars 240 forks source link

Does MA estimate the right coefficients? #123

Open jorgecarleitao opened 6 years ago

jorgecarleitao commented 6 years ago

Consider the code below that

  1. generates an MA(1) process of 100k time points
  2. fits the process using pyflux
  3. fits the process using statsmodels

I am getting that pyflux does not estimate the MA coefficient right. In opposition, statsmodels does. Anyone gets the same result?

How to reproduce:

def get_arma_process(ar, ma=0, samples=100000):
    """
    Generates a realization of an ARMA(ar, ma) process.
    """
    # a small noise so that the coefficients are well determined
    noise = np.random.normal(scale=0.01, size=samples)

    if ar > 0:
        # when sum of coefficients is smaller than 1 the process is stationary
        coefficients_ar = 0.99 * np.ones(shape=ar) / ar
    else:
        coefficients_ar = []

    if ma > 0:
        coefficients_ma = 0.5 * np.ones(shape=ma)
    else:
        coefficients_ma = []

    # start at 1
    x = np.ones(samples)

    for i in range(max(ar, ma), samples):
        x[i] = np.sum(coefficients_ar[d] * x[i - 1 - d] for d in range(0, ar)) + \
               np.sum(coefficients_ma[d] * noise[i - 1 - d] for d in range(0, ma)) + noise[i]

    return x[max(ar, ma):]

def _test_ARMA(ar, ma):
    """
    Tests that a ARMA(ar, ma) model fits an ARMA(ar, ma) process.
    """
    data = get_arma_process(ar=ar, ma=ma)

    model = ARIMA(data=data, ar=ar, ma=ma)
    coefficients = model.fit().z.get_z_values(transformed=True)

    assert (len(coefficients) == ar + ma + 2)  # 1 constant + 1 for the noise scale

    # Constant coefficient 0 within 0.1
    assert (np.abs(coefficients[0]) < 0.1) 

    # AR coefficients within 10%
    if ar > 0:
        expected_ar = 0.99 / ar  # same as used in `get_arma_process`
        for ar_i in range(ar):
            assert (np.abs(coefficients[1 + ar_i] - expected_ar) / expected_ar < 0.1)

    # MA coefficients within 10%
    if ma > 0:
        expected_ma = 0.5  # same as used in `get_arma_process`
        for ma_i in range(ar, ma):
            assert (np.abs(coefficients[1 + ma_i] - expected_ma) / expected_ma < 0.1)

    # Normal scale coefficient within 10%
    expected_noise = 0.01  # same as used in `get_arma_process`
    assert (np.abs(coefficients[-1] - expected_noise) / expected_noise < 0.1)

_test_ARMA(0, 1)

If I run _test_ARMA(1, 0) or _test_ARMA(2, 0), the test passes. If I run _test_ARMA(0, 1) the test does not pass. Specifically, the MA coefficient is computed to be ~0, while the expected result is 0.5.

I took the same test and ran it with statsmodels using

m = statsmodels.tsa.arima_model.ARMA(data, order=(ar, ma))
coefficients = m.fit(disp=False).params

and the test passes (ignoring the normal scale coefficient that is not estimated by the statsmodels).

Any ideas?

RJT1990 commented 6 years ago

Thank you for reporting this issue. I did some tests a year ago between different packages (R/Python) and the results were inconsistent - in some cases the PyFlux implementation gave superior results, other times other packages gave superior results. But in the case you give, it looks like a problem with the PyFlux optimization. Will try to have a look when I get some time.