dmbee / seglearn

Python module for machine learning time series:
https://dmbee.github.io/seglearn/
BSD 3-Clause "New" or "Revised" License
571 stars 63 forks source link

Feature Request: Event Segmentation #46

Closed jmrichardson closed 4 years ago

jmrichardson commented 4 years ago

Hi, seglearn looks very interesting and was hoping it could provide functionality of segmenting timeseries data by events (ie determined by index). For example, lets say I have a list of events that are interesting based on a time series, and would like a window of the previous X observations per event. I would like to use the index to find the location in the time series, get the last X observations. Then iterate over the rest of the indexes to get all windows.

Here is the code I am using to segment my data into (samples, length, dimensions) using the index:

      self.Xs = []
        for i in self.Xtbl.index:
            start = self.X.index.get_loc(i)-sliding_window
            end = self.X.index.get_loc(i)
            window = self.X.iloc[start:end, ].to_numpy()
            self.Xs.append(window)
        self.Xs = np.stack(self.Xs)
        self.ys = self.ytbl.copy()

I would like to use seglearn with this functionality in an pipeline to fit. However, the segment class doesn't appear to segment based on events. Thanks for your consideration.

dmbee commented 4 years ago

I think the easiest way to do that, is make the target variable (y) the event indices. Then use y_func = 'last' and step=1 for the Segment transform. Another alternative (more common) would be to define y as the number of timesteps prior to an event. Again you could use the same segmentation strategy. You have to decide what makes sense for your problem. Are you trying to predict time to an event? Or presence or absence of an event in the window.

If you want to do something more complicated, you'd can write your own transform. Just inherit XyTransformerMixin, and implement fit and transform functions and you can include your own transformer in the pipeline.