CSchoel / nolds

Nonlinear measures for dynamical systems (based on one-dimensional time series)
MIT License
261 stars 57 forks source link

ValueError threshold doesn't catch every edge case #9

Closed EmreAtes closed 6 years ago

EmreAtes commented 6 years ago

I get some errors for time series of different lengths. As far as I can see, there aren't any warnings in the documentation about these.

For example, for lag=5 and min_tsep=10, any time series that are shorter than 45 elements long gets the following error:

>>> nolds.lyap_r(np.asarray(time_series[:45]), lag=5, min_tsep=10)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/pyenv/lib/python3.6/site-packages/nolds/measures.py", line 280, in lyap_r
    orbit = delay_embedding(data, emb_dim, lag)
  File "/pyenv/lib/python3.6/site-packages/nolds/measures.py", line 106, in delay_embedding
    raise ValueError(msg.format(len(data), emb_dim, lag))
ValueError: cannot embed data of length 45 with embedding dimension 10 and lag 5

However, until 54 elements, a different error is given:

>>> nolds.lyap_r(np.asarray(time_series[:54]), lag=5, min_tsep=10)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/pyenv/lib/python3.6/site-packages/nolds/measures.py", line 293, in lyap_r
    nb_idx = np.argmin(dists[:ntraj, :ntraj], axis=1)
  File "/pyenv/lib/python3.6/site-packages/numpy/core/fromnumeric.py", line 1019, in argmin
    return _wrapfunc(a, 'argmin', axis=axis, out=out)
  File "/pyenv/lib/python3.6/site-packages/numpy/core/fromnumeric.py", line 57, in _wrapfunc
    return getattr(obj, method)(*args, **kwds)
ValueError: attempt to get argmin of an empty sequence

For lag=4 and min_tsep=8, the thresholds are 36 to 55, however 45 generates no errors. Also, for some time-series lengths, I get another error:

>>> nolds.lyap_r(np.asarray(time_series[:54]), lag=4, min_tsep=8)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/pyenv/lib/python3.6/site-packages/nolds/measures.py", line 300, in lyap_r
    div_traj_k = dists[indices]
IndexError: shape mismatch: indexing arrays could not be broadcast together with shapes (0,) (17,) 

What are the allowed timeseries lenghts?

Also, how is the lyapunov exponent defined for a timeseries which is [263, 267, 267, 267, ...]? Currently I get this value:

>>> nolds.lyap_r(np.asarray(time_series[60:-60]), lag=4, min_tsep=8)
/project/peaclab-mon/pyenv/lib/python3.6/site-packages/nolds/measures.py:75: RuntimeWarning: RANSAC did not reach consensus, using numpy's polyfit
  RuntimeWarning)
/project/peaclab-mon/pyenv/lib/python3.6/site-packages/numpy/lib/polynomial.py:584: RuntimeWarning: invalid value encountered in true_divide
  lhs /= scale
/project/peaclab-mon/pyenv/lib/python3.6/site-packages/nolds/measures.py:76: RankWarning: Polyfit may be poorly conditioned
  coef = np.polyfit(x, y, degree)
nan

Finally, kind of unrelated, when a pandas series is passed into the function, I get another error, thats why I use the np.asarray. I can open a separate issue for that if necessary:

>>> nolds.lyap_r(time_series, lag=4, min_tsep=8)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/pyenv/lib/python3.6/site-packages/nolds/measures.py", line 280, in lyap_r
    orbit = delay_embedding(data, emb_dim, lag)
  File "/pyenv/lib/python3.6/site-packages/nolds/measures.py", line 110, in delay_embedding
    return data[indices]
  File "/pyenv/lib/python3.6/site-packages/pandas/core/series.py", line 642, in __getitem__
    return self._get_with(key)
  File "/pyenv/lib/python3.6/site-packages/pandas/core/series.py", line 674, in _get_with
    return self.reindex(key)
  File "/pyenv/lib/python3.6/site-packages/pandas/core/series.py", line 2426, in reindex
    return super(Series, self).reindex(index=index, **kwargs)
  File "/pyenv/lib/python3.6/site-packages/pandas/core/generic.py", line 2515, in reindex
    fill_value, copy).__finalize__(self)
  File "/pyenv/lib/python3.6/site-packages/pandas/core/generic.py", line 2528, in _reindex_axes
    tolerance=tolerance, method=method)
  File "/pyenv/lib/python3.6/site-packages/pandas/core/indexes/base.py", line 2882, in reindex
    tolerance=tolerance)
  File "/pyenv/lib/python3.6/site-packages/pandas/core/indexes/base.py", line 2598, in get_indexer
    indexer = self._engine.get_indexer(target._values)
  File "pandas/_libs/index.pyx", line 306, in pandas._libs.index.IndexEngine.get_indexer (pandas/_libs/index.c:7518)
  File "pandas/_libs/hashtable_class_helper.pxi", line 808, in pandas._libs.hashtable.Int64HashTable.lookup (pandas/_libs/hashtable.c:14720)
ValueError: Buffer has wrong number of dimensions (expected 1, got 2)
>>> time_series.shape
(619,)
CSchoel commented 6 years ago

Hello Emre. Thank you for your feedback. I have the issue in the back of my mind since you posted it, but I am currently to busy with my Job and PhD to work on Nolds. I expect to be able to look into this in a month or two.

CSchoel commented 6 years ago

It seems it has taken a lot more than two month for me to finally find the time to respond to this issue. Unfortunately, it has been a stressful semester for me.

Regarding your issues:

EmreAtes commented 6 years ago

Thanks. I hope your PhD is going well now. I ended up using rpy2 and calling some R functions to calculate the Lyapunov exponent, but let me know if there's anything you need from me.

CSchoel commented 6 years ago

I just released version 0.5.1 which should adress both the issue with pandas and with the minimum required length for several algorithms.

Fun facts:

I see that these requirements where not very obvious before. :wink: Now you can calculate them with the functions lyap_r_len and lyap_e_len and you get comprehensible warnings and errors.

I will close this issue now, but since I am not familiar with pandas, I would appreciate it if you could let me know if your example now also works when you pass the pandas object directly to lyap_r (either here as quick comment or via mail).

EmreAtes commented 6 years ago

Yeah, it works fine with pandas, and there are more informative warnings on the minimum input length. Thanks.

Also, I've noticed that a constant series returns -inf now instead of nan, but I don't know which one is supposed to be correct.

CSchoel commented 5 years ago

Nice, thanks for testing. :smile:

I think -inf should be correct, since a constant series is the most un-chaotic thing you can imagine. :laughing: