blue-yonder / tsfresh

Automatic extraction of relevant features from time series:
http://tsfresh.readthedocs.io
MIT License
8.43k stars 1.21k forks source link

'friedrich_coefficients' and 'max_langevin_fixed_point' do not work with a single record DataFrame #929

Open momijiame opened 2 years ago

momijiame commented 2 years ago

I am a newbie of tsfresh, so sorry if I misunderstood something.

The problem:

I encountered an exception in the following tutorial.

"Rolling/Time series forecasting" https://tsfresh.readthedocs.io/en/latest/text/forecasting.html

The reproduction procedure is as follows. Just input the snipets in the tutorial in order at the prompt.

  1. Launch Python interpreter
$ python3
  1. Define a DataFrame
>>> import pandas as pd
>>> df = pd.DataFrame({
...    "id": [1, 1, 1, 1, 2, 2],
...    "time": [1, 2, 3, 4, 8, 9],
...    "x": [1, 2, 3, 4, 10, 11],
...    "y": [5, 6, 7, 8, 12, 13],
... })
  1. Extract a rolling DataFrame
>>> from tsfresh.utilities.dataframe_functions import roll_time_series
>>> df_rolled = roll_time_series(df, column_id="id", column_sort="time")
  1. Extract features from a rolling DataFrame
>>> from tsfresh import extract_features
>>> df_features = extract_features(df_rolled, column_id="id", column_sort="time")

The following exception are raised at (4.) operation.

Feature Extraction:   0%|                                                                                            | 0/12 [00:01<?, ?it/s]
multiprocessing.pool.RemoteTraceback: 
"""
Traceback (most recent call last):
  File "/usr/local/Cellar/python@3.9/3.9.10/Frameworks/Python.framework/Versions/3.9/lib/python3.9/multiprocessing/pool.py", line 125, in worker
    result = (True, func(*args, **kwds))
  File "/Users/amedama/.virtualenvs/py39/lib/python3.9/site-packages/tsfresh/utilities/distribution.py", line 43, in _function_with_partly_reduce
    results = list(itertools.chain.from_iterable(results))
  File "/Users/amedama/.virtualenvs/py39/lib/python3.9/site-packages/tsfresh/utilities/distribution.py", line 42, in <genexpr>
    results = (map_function(chunk, **kwargs) for chunk in chunk_list)
  File "/Users/amedama/.virtualenvs/py39/lib/python3.9/site-packages/tsfresh/feature_extraction/extraction.py", line 386, in _do_extraction_on_chunk
    return list(_f())
  File "/Users/amedama/.virtualenvs/py39/lib/python3.9/site-packages/tsfresh/feature_extraction/extraction.py", line 364, in _f
    result = func(x, param=parameter_list)
  File "/Users/amedama/.virtualenvs/py39/lib/python3.9/site-packages/tsfresh/feature_extraction/feature_calculators.py", line 2103, in friedrich_coefficients
    calculated[m][r] = _estimate_friedrich_coefficients(x, m, r)
  File "/Users/amedama/.virtualenvs/py39/lib/python3.9/site-packages/tsfresh/feature_extraction/feature_calculators.py", line 152, in _estimate_friedrich_coefficients
    df["quantiles"] = pd.qcut(df.signal, r)
  File "/Users/amedama/.virtualenvs/py39/lib/python3.9/site-packages/pandas/core/reshape/tile.py", line 376, in qcut
    bins = np.quantile(x_np, quantiles)
  File "<__array_function__ internals>", line 5, in quantile
  File "/Users/amedama/.virtualenvs/py39/lib/python3.9/site-packages/numpy/lib/function_base.py", line 3979, in quantile
    return _quantile_unchecked(
  File "/Users/amedama/.virtualenvs/py39/lib/python3.9/site-packages/numpy/lib/function_base.py", line 3986, in _quantile_unchecked
    r, k = _ureduce(a, func=_quantile_ureduce_func, q=q, axis=axis, out=out,
  File "/Users/amedama/.virtualenvs/py39/lib/python3.9/site-packages/numpy/lib/function_base.py", line 3564, in _ureduce
    r = func(a, **kwargs)
  File "/Users/amedama/.virtualenvs/py39/lib/python3.9/site-packages/numpy/lib/function_base.py", line 4109, in _quantile_ureduce_func
    x_below = take(ap, indices_below, axis=0)
  File "<__array_function__ internals>", line 5, in take
  File "/Users/amedama/.virtualenvs/py39/lib/python3.9/site-packages/numpy/core/fromnumeric.py", line 190, in take
    return _wrapfunc(a, 'take', indices, axis=axis, out=out, mode=mode)
  File "/Users/amedama/.virtualenvs/py39/lib/python3.9/site-packages/numpy/core/fromnumeric.py", line 57, in _wrapfunc
    return bound(*args, **kwds)
IndexError: cannot do a non-empty take from an empty axes.
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/amedama/.virtualenvs/py39/lib/python3.9/site-packages/tsfresh/feature_extraction/extraction.py", line 164, in extract_features
    result = _do_extraction(
  File "/Users/amedama/.virtualenvs/py39/lib/python3.9/site-packages/tsfresh/feature_extraction/extraction.py", line 294, in _do_extraction
    result = distributor.map_reduce(
  File "/Users/amedama/.virtualenvs/py39/lib/python3.9/site-packages/tsfresh/utilities/distribution.py", line 241, in map_reduce
    result = list(itertools.chain.from_iterable(result))
  File "/Users/amedama/.virtualenvs/py39/lib/python3.9/site-packages/tqdm/std.py", line 1180, in __iter__
    for obj in iterable:
  File "/usr/local/Cellar/python@3.9/3.9.10/Frameworks/Python.framework/Versions/3.9/lib/python3.9/multiprocessing/pool.py", line 870, in next
    raise value
  File "/usr/local/Cellar/python@3.9/3.9.10/Frameworks/Python.framework/Versions/3.9/lib/python3.9/multiprocessing/pool.py", line 125, in worker
    result = (True, func(*args, **kwds))
  File "/Users/amedama/.virtualenvs/py39/lib/python3.9/site-packages/tsfresh/utilities/distribution.py", line 43, in _function_with_partly_reduce
    results = list(itertools.chain.from_iterable(results))
  File "/Users/amedama/.virtualenvs/py39/lib/python3.9/site-packages/tsfresh/utilities/distribution.py", line 42, in <genexpr>
    results = (map_function(chunk, **kwargs) for chunk in chunk_list)
  File "/Users/amedama/.virtualenvs/py39/lib/python3.9/site-packages/tsfresh/feature_extraction/extraction.py", line 386, in _do_extraction_on_chunk
    return list(_f())
  File "/Users/amedama/.virtualenvs/py39/lib/python3.9/site-packages/tsfresh/feature_extraction/extraction.py", line 364, in _f
    result = func(x, param=parameter_list)
  File "/Users/amedama/.virtualenvs/py39/lib/python3.9/site-packages/tsfresh/feature_extraction/feature_calculators.py", line 2103, in friedrich_coefficients
    calculated[m][r] = _estimate_friedrich_coefficients(x, m, r)
  File "/Users/amedama/.virtualenvs/py39/lib/python3.9/site-packages/tsfresh/feature_extraction/feature_calculators.py", line 152, in _estimate_friedrich_coefficients
    df["quantiles"] = pd.qcut(df.signal, r)
  File "/Users/amedama/.virtualenvs/py39/lib/python3.9/site-packages/pandas/core/reshape/tile.py", line 376, in qcut
    bins = np.quantile(x_np, quantiles)
  File "<__array_function__ internals>", line 5, in quantile
  File "/Users/amedama/.virtualenvs/py39/lib/python3.9/site-packages/numpy/lib/function_base.py", line 3979, in quantile
    return _quantile_unchecked(
  File "/Users/amedama/.virtualenvs/py39/lib/python3.9/site-packages/numpy/lib/function_base.py", line 3986, in _quantile_unchecked
    r, k = _ureduce(a, func=_quantile_ureduce_func, q=q, axis=axis, out=out,
  File "/Users/amedama/.virtualenvs/py39/lib/python3.9/site-packages/numpy/lib/function_base.py", line 3564, in _ureduce
    r = func(a, **kwargs)
  File "/Users/amedama/.virtualenvs/py39/lib/python3.9/site-packages/numpy/lib/function_base.py", line 4109, in _quantile_ureduce_func
    x_below = take(ap, indices_below, axis=0)
  File "<__array_function__ internals>", line 5, in take
  File "/Users/amedama/.virtualenvs/py39/lib/python3.9/site-packages/numpy/core/fromnumeric.py", line 190, in take
    return _wrapfunc(a, 'take', indices, axis=axis, out=out, mode=mode)
  File "/Users/amedama/.virtualenvs/py39/lib/python3.9/site-packages/numpy/core/fromnumeric.py", line 57, in _wrapfunc
    return bound(*args, **kwds)
IndexError: cannot do a non-empty take from an empty axes.

Anything else we need to know?:

I investigated why the above issue arise. And I found out that the cause was the calculation of 'friedrich_coefficients' and 'max_langevin_fixed_point'. Since if both calculations are removed from FC settings, the exception will not be raised.

import pandas as pd
df = pd.DataFrame({
    "id": [1, 1, 1, 1, 2, 2],
    "time": [1, 2, 3, 4, 8, 9],
    "x": [1, 2, 3, 4, 10, 11],
    "y": [5, 6, 7, 8, 12, 13],
})
from tsfresh.utilities.dataframe_functions import roll_time_series
df_rolled = roll_time_series(df, column_id="id", column_sort="time")

# drop features 'friedrich_coefficients' and 'max_langevin_fixed_point' from FC settings
from tsfresh.feature_extraction import ComprehensiveFCParameters
settings = ComprehensiveFCParameters()
del settings['friedrich_coefficients']
del settings['max_langevin_fixed_point']

# extract features with FC settings
from tsfresh import extract_features
df_features = extract_features(df_rolled,
                               column_id="id",
                               column_sort="time",
                               default_fc_parameters=settings,
                               )

I also realized that these calculations do not support a single record DataFrame. For example, let's take the first (t=1) rolled DataFrame and give it. This raises the same exception as before.

>>> df_rolled.iloc[0:1]
       id  time  x  y
7  (1, 1)     1  1  5
>>> df_features = extract_features(df_rolled.iloc[0:1],
...                                column_id="id",
...                                column_sort="time",
...                                )

However, the exception will not be raised for the next (t=2) rolled DataFrame.

>>> df_rolled.iloc[1:3]
        id  time  x  y
9   (1, 2)     1  1  5
10  (1, 2)     2  2  6
>>> df_features = extract_features(df_rolled.iloc[1:3],
...                                column_id="id",
...                                column_sort="time",
...                                )

This behavior does not occur in other calculations.

Environment:

mdhanna commented 2 years ago

I was experiencing the same issue with Python 3.8. I downgraded to Python 3.7 and have been able to execute the same code successfully.

momijiame commented 2 years ago

Thank you for the valuable information. Apparently, this problem depends on the version of pandas. If downgrading Python to 3.7, the version of pandas will be older (< 1.4). In other words, Python version 3.8 or later will also work if the pandas version is less than 1.4.

  1. The following environment is not worked:
$ python -V            
Python 3.9.13
$ pip list | grep pandas
pandas             1.4.3
$ pip list | grep numpy
numpy              1.21.5
  1. Downgrade the version of pandas:
$ pip install -U "pandas<1.4"
  1. The following environment is worked:
$ python -V            
Python 3.9.13
$ pip list | grep pandas
pandas             1.3.5
$ pip list | grep numpy 
numpy              1.21.5
paulbauriegel commented 2 years ago

You can add a small Exception clause for the IndexError under the existing one in the _estimate_friedrich_coefficients function to "solve" the problem.

    try:
        df["quantiles"] = pd.qcut(df.signal, r)
    except ValueError:
        return [np.NaN] * (m + 1)
    except IndexError:
        return [np.NaN] * (m + 1)