blue-yonder / tsfresh

Automatic extraction of relevant features from time series:
http://tsfresh.readthedocs.io
MIT License
8.22k stars 1.21k forks source link

Future Improvement: The transforms covered in The Great Time Series Classification BakeOff #142

Open ClimbsRocks opened 7 years ago

ClimbsRocks commented 7 years ago

There was an interesting paper last year which compared a bunch of newly published time series classifiers.

http://www-bcf.usc.edu/~liu32/milets16/paper/MiLeTS_2016_paper_5.pdf

In a wild oversimplification, several of the classifiers simply transform the data in a new way, then run pretty standard time series classification algorithms on the transformed data.

It would be awesome if tsfresh could become Python's central repository for these different time series data transformations.

The cool part about this project is that they have all the source code available. The team from UEA even reached out to the authors of different papers if the code they produced didn't accurately recreate the results of the published papers.

https://bitbucket.org/TonyBagnall/time-series-classification/src/f4fe66b74e039b0475a87ebf6d57db400da25b63/TimeSeriesClassification/src/tsc_algorithms/COTE.java?at=default&fileviewer=file-view-default

They're very up front about the terrible time complexity of their current implementations. But I think people could probably improve upon that if re-implementing for speed.

This is certainly not an immediate or even near-term project to take on. And I assume that you've already got some of these transforms baked into tsfresh. But I'm interested in trying out some of these techniques in my own work, and figured you guys would probably be interested in some of this stuff too.

earthgecko commented 7 years ago

@ClimbsRocks the terrible time complexity of current implementations with timeseries is part of the appeal surely ;) Is that not why we are interested? Really it could all be better with timeseries couldn't it? Bizarre that one of the most basic data sets we have is what machine learning is really arguably least good with :) Maybe we are making too complex? However tsfresh does make one aspect so much simpler! And hats off on the shout out at the tsfresh peeps, they are so very nice!

GillesVandewiele commented 7 years ago

111 is definitely a sub-issue of this ;)

btw, did you know that all these results are available at www.timeseriesclassification.com

ClimbsRocks commented 7 years ago

Yeah, that website is a really impressive gathering of experiments and results. If I weren't working in systems that assumed Python usage, I'd probably be copy/pasting a lot of their code right now.

ddofer commented 7 years ago

This is a really useful resource!