RJT1990 / pyflux

Open source time series library for Python
BSD 3-Clause "New" or "Revised" License
2.1k stars 240 forks source link

VAR model's AIC, BIC and Log Likelihood are non-deterministic #115

Closed Alechan closed 6 years ago

Alechan commented 6 years ago

Example code:

file issuegithub.py

import numpy as np
import pyflux as pf
import pandas as pd

df = pd.DataFrame(np.linspace(0,100,1000), columns=list('A'))
# Model the data using VAR modeling
model = pf.VAR(data=df, lags=2, integ=1)
# Fit the model
x = model.fit()
# Print the aic
print(x.aic)

Runs:

$ python issuegithub.py
651.7090301513672
$ python issuegithub.py
636.0988006591797
$ python issuegithub.py
-16537352142428.482

I've narrowed it down to a call to var_likelihood, that is defined in pyflux/var/var_recursions.cpython-36m-x86_64-linux-gnu.so.

It's a .so file so I can't debug it but adding a print to the result of the call to that function for each run will show that each time it will return something different.

Alechan commented 6 years ago

Edited the original comment because it was a random dataframe. Now the dataframe is fixed.

dioh commented 6 years ago

We have found a workaround to get consistente performance values. We traced the error to the var_recursion method that is used to calculate the negative likelihood. The code was in cython so we couldn't debug it. When we replaced the implementation with a python one we started having consistent metrics.

We basically added to the VAR class the following instance method:

    def var_likelihood(self, ll1, mu_shape, diff, inverse):
        ll2 = 0.0
        for t in range(0,mu_shape):
            ll2 += np.dot(np.dot(diff[t].T,inverse),diff[t])

        return -(ll1 -0.5*ll2)

What do you think the issue may be? an old version of the library maybe?

RJT1990 commented 6 years ago

Hi Alechan - I am trying this out now to verify.

RJT1990 commented 6 years ago

Hi Alechan - I have a number of problems here with this query. With your example, I get the error: LinAlgError("Singular matrix") upon initialization.

Additionally I am trying with the base VAR example here - http://pyflux.readthedocs.io/en/latest/var.html - and cannot replicate the indeterministic AIC response.

This may be an issue with the old version of a library (or dependencies). I would recommend upgrading and then reporting back if the issue persists.

RJT1990 commented 6 years ago

Closing .