dmbee / seglearn

Python module for machine learning time series:
https://dmbee.github.io/seglearn/
BSD 3-Clause "New" or "Revised" License
570 stars 63 forks source link

Is FeatureRep() with sequences of different length possible? #44

Closed emial637 closed 4 years ago

emial637 commented 4 years ago

Hi! I've got multidimensional time-series data, in which the different samples are of different temporal length (between 600 - 1800).

When using PadTrunc() the shorter samples get zeros appended to them, which will affect the features in the following step, for example the mean. And I would thus like to avoid the PadTrunc() step.

Pype([('pad', PadTrunc(width=1000)),
      ('features', FeatureRep('default')),
          ...

In theory I see no problem with calculatuing the features from samples of different lengths, but I cant manage to do it practically.

PadTrunc() reshapes the data to (n_samples, truncation_width, n_channels) , which seems to be the assumed shape for the rest of the pipeline. Without PadTrunc() there is no way to get samples of different lengths in to an array of this type.

Is it possible to do this in seglearn somehow, and if not, what is the consensus with PadTrunc() in order to avoid affecting the features?

Thanks.

dmbee commented 4 years ago

Hi,

Can you set the pad/trunc width to the minimum series length in your data set? This would avoid the trailing zeros. In general, seglearn (and most people afaik) deal with different length time series either through segmentation, padding, or truncation.

Although not supported by this package, you could calculate features based on series with different lengths. However, keep in mind length will also effect some features like zero-crossings, sum of squares, etc. So be careful which features you choose. The main challenge with doing that is computation efficiency. The speed of feature calculations depends the calculations being vectorized (at least in pure python). Most deep learning algorithms also rely on fixed time series length as well for efficient training.

Good luck.