ValueError threshold doesn't catch every edge case

EmreAtes commented 6 years ago

I get some errors for time series of different lengths. As far as I can see, there aren't any warnings in the documentation about these.

For example, for lag=5 and min_tsep=10, any time series that are shorter than 45 elements long gets the following error:

>>> nolds.lyap_r(np.asarray(time_series[:45]), lag=5, min_tsep=10)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/pyenv/lib/python3.6/site-packages/nolds/measures.py", line 280, in lyap_r
    orbit = delay_embedding(data, emb_dim, lag)
  File "/pyenv/lib/python3.6/site-packages/nolds/measures.py", line 106, in delay_embedding
    raise ValueError(msg.format(len(data), emb_dim, lag))
ValueError: cannot embed data of length 45 with embedding dimension 10 and lag 5

However, until 54 elements, a different error is given:

>>> nolds.lyap_r(np.asarray(time_series[:54]), lag=5, min_tsep=10)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/pyenv/lib/python3.6/site-packages/nolds/measures.py", line 293, in lyap_r
    nb_idx = np.argmin(dists[:ntraj, :ntraj], axis=1)
  File "/pyenv/lib/python3.6/site-packages/numpy/core/fromnumeric.py", line 1019, in argmin
    return _wrapfunc(a, 'argmin', axis=axis, out=out)
  File "/pyenv/lib/python3.6/site-packages/numpy/core/fromnumeric.py", line 57, in _wrapfunc
    return getattr(obj, method)(*args, **kwds)
ValueError: attempt to get argmin of an empty sequence

For lag=4 and min_tsep=8, the thresholds are 36 to 55, however 45 generates no errors. Also, for some time-series lengths, I get another error:

>>> nolds.lyap_r(np.asarray(time_series[:54]), lag=4, min_tsep=8)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/pyenv/lib/python3.6/site-packages/nolds/measures.py", line 300, in lyap_r
    div_traj_k = dists[indices]
IndexError: shape mismatch: indexing arrays could not be broadcast together with shapes (0,) (17,)

What are the allowed timeseries lenghts?

Also, how is the lyapunov exponent defined for a timeseries which is [263, 267, 267, 267, ...]? Currently I get this value:

>>> nolds.lyap_r(np.asarray(time_series[60:-60]), lag=4, min_tsep=8)
/project/peaclab-mon/pyenv/lib/python3.6/site-packages/nolds/measures.py:75: RuntimeWarning: RANSAC did not reach consensus, using numpy's polyfit
  RuntimeWarning)
/project/peaclab-mon/pyenv/lib/python3.6/site-packages/numpy/lib/polynomial.py:584: RuntimeWarning: invalid value encountered in true_divide
  lhs /= scale
/project/peaclab-mon/pyenv/lib/python3.6/site-packages/nolds/measures.py:76: RankWarning: Polyfit may be poorly conditioned
  coef = np.polyfit(x, y, degree)
nan

Finally, kind of unrelated, when a pandas series is passed into the function, I get another error, thats why I use the np.asarray. I can open a separate issue for that if necessary:

>>> nolds.lyap_r(time_series, lag=4, min_tsep=8)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/pyenv/lib/python3.6/site-packages/nolds/measures.py", line 280, in lyap_r
    orbit = delay_embedding(data, emb_dim, lag)
  File "/pyenv/lib/python3.6/site-packages/nolds/measures.py", line 110, in delay_embedding
    return data[indices]
  File "/pyenv/lib/python3.6/site-packages/pandas/core/series.py", line 642, in __getitem__
    return self._get_with(key)
  File "/pyenv/lib/python3.6/site-packages/pandas/core/series.py", line 674, in _get_with
    return self.reindex(key)
  File "/pyenv/lib/python3.6/site-packages/pandas/core/series.py", line 2426, in reindex
    return super(Series, self).reindex(index=index, **kwargs)
  File "/pyenv/lib/python3.6/site-packages/pandas/core/generic.py", line 2515, in reindex
    fill_value, copy).__finalize__(self)
  File "/pyenv/lib/python3.6/site-packages/pandas/core/generic.py", line 2528, in _reindex_axes
    tolerance=tolerance, method=method)
  File "/pyenv/lib/python3.6/site-packages/pandas/core/indexes/base.py", line 2882, in reindex
    tolerance=tolerance)
  File "/pyenv/lib/python3.6/site-packages/pandas/core/indexes/base.py", line 2598, in get_indexer
    indexer = self._engine.get_indexer(target._values)
  File "pandas/_libs/index.pyx", line 306, in pandas._libs.index.IndexEngine.get_indexer (pandas/_libs/index.c:7518)
  File "pandas/_libs/hashtable_class_helper.pxi", line 808, in pandas._libs.hashtable.Int64HashTable.lookup (pandas/_libs/hashtable.c:14720)
ValueError: Buffer has wrong number of dimensions (expected 1, got 2)
>>> time_series.shape
(619,)

CSchoel commented 6 years ago

Hello Emre. Thank you for your feedback. I have the issue in the back of my mind since you posted it, but I am currently to busy with my Job and PhD to work on Nolds. I expect to be able to look into this in a month or two.

CSchoel commented 6 years ago

It seems it has taken a lot more than two month for me to finally find the time to respond to this issue. Unfortunately, it has been a stressful semester for me.

Regarding your issues:

I tired to come up with a clear formula for the minimum length required for lyap_r. Something like (emb_dim - 1) * lag + 1 + max(min_tsep * 2, trajectory_len) should do the trick, but I will have to run a few tests to confirm that. Once I do, I will add a more comprehensible error message to the function.
In a time series that is mostly constant, nearly all of your distances will be zero. This means that you get negative infinities in the log plot, which are ignored. What remains is only one nonzero difference between some vector [263, 267, 267, 267, ...] and another vector [267, 267, 267, 267, ...]. This leaves you with a single point, which is not enough for a meaningful line fitting. I did not yet run any test to confirm this, but it seems like a plausible explanation for the nan-result.
Nolds uses/assumes pure numpy arrays for most functions. You will have to convert your pandas series as you suggested. I will consider calling asarray on the input in all public functions in future versions since more people seem to use the module with pandas. This may take some time however, because I will also have to add additional tests to be sure that I do not miss any functions.

EmreAtes commented 6 years ago

Thanks. I hope your PhD is going well now. I ended up using rpy2 and calling some R functions to calculate the Lyapunov exponent, but let me know if there's anything you need from me.

CSchoel commented 6 years ago

I just released version 0.5.1 which should adress both the issue with pandas and with the minimum required length for several algorithms.

Fun facts:

I was close with my formula but not quite there. For lyap_r you need (emb_dim - 1) * lag + trajectory_len + min_tsep * 2 + 1 data points.
For lyap_e the required length is emb_dim + (emb_dim - 1)/(matrix_dim - 1) + min_tsep * 2 + min_nb

I see that these requirements where not very obvious before. :wink: Now you can calculate them with the functions lyap_r_len and lyap_e_len and you get comprehensible warnings and errors.

I will close this issue now, but since I am not familiar with pandas, I would appreciate it if you could let me know if your example now also works when you pass the pandas object directly to lyap_r (either here as quick comment or via mail).

EmreAtes commented 6 years ago

Yeah, it works fine with pandas, and there are more informative warnings on the minimum input length. Thanks.

Also, I've noticed that a constant series returns -inf now instead of nan, but I don't know which one is supposed to be correct.

CSchoel commented 5 years ago

Nice, thanks for testing. :smile:

I think -inf should be correct, since a constant series is the most un-chaotic thing you can imagine. :laughing:

CSchoel / nolds

ValueError threshold doesn't catch every edge case #9