No rolling statistics - Githubissues

cesium-ml / cesium

Machine Learning Time-Series Platform

Other

670 stars 101 forks source link

No rolling statistics #180

Open NoahSlv opened 8 years ago

NoahSlv commented 8 years ago

I have a simple time series of a single column. About 20,000 measurements, once every 5 minutes.

Using featurize.featurize_time_series with some built in features (median, minimum, maximum, etc.) I get back ONE SINGLE measurement for the entire time series.

Most other time series libraries I've used will generate a series of measurements, often with user specified lag, so that prediction can be built. Does cesium have similar functionality?

Thanks!

profjsb commented 8 years ago

@NoahSlv It might be helpful to learn more about your use case. We've not explicitly built a time series forecast engine, which is what your question implies. Instead, the featurization part of the codebase is used transform the input timeseries into an array of features, which in turn can be used to learn a (supervised) classifier. That is:

 f_n(ts) -> Real  (for each feature n)

This leads to a feature vector of size n. What you are suggesting is something like

f_n(ts) -> Real^m (where m is the output vector, say a smoothed version of the original time series)

while in principle one could create a feature vector of size m x n, this isn't normally how time series are featurized for classification.

If you can tell us a bit more about your use case we can take it from there. Thanks!

stefanv commented 7 years ago

@NoahSlv Please let us know your thoughts on the above. Thanks!

sfrodrigues commented 7 years ago

Hey guys,

many thanks for this very nice tool!

I have a similar case to what NoahSlv was describing: I have a time series dataset like:

Index, Feature_1, ...., Feature_n, Label 2015-01-01, 2.4, ..., 2.7, 3 2015-01-02, 2.2, ..., 2.2, 4 2015-01-03, 2.3, ..., 2.5, 2

And I would like to extract features from it. However, Im not looking for a single array of features for the entire dataset. I want to extract features for each of the rows while only using info that is know at each row, i.e. only using that row and the previous ones (time dependency).

So that after I would have something like:

Index, Feature_1, ...., Feature_n, New_Feature_1, ...., New_Feature_n, Label 2015-01-01, 2.4, ..., 2.7, 3.4, ..., 1.7, 3 2015-01-02, 2.2, ..., 2.2, 3.2, ..., 7.7, 4 2015-01-03, 2.3, ..., 2.5, 2.5, ..., 2.8, 2

Is it possible to do this with cesium? Or are you planning on expanding the tool to allow it?

Cheers

bnaul commented 7 years ago

@sfragosorodrig this is not something we currently support but we have considered adding something along those lines; it's not yet being actively developed, though.