TDAmeritrade / stumpy

STUMPY is a powerful and scalable Python library for modern time series analysis
https://stumpy.readthedocs.io/en/latest/
Other
3.66k stars 319 forks source link

Add Tutorial(s) that Reproduce "100 Time Series Data Mining Questions" PDF #107

Open seanlaw opened 4 years ago

seanlaw commented 4 years ago

On the UCR Matrix Profile site, they have a growing list of time series questions that can be solved by computing the matrix profile. The PDF can be found here and the corresponding code/data is here.

It would be interesting to begin compiling a STUMPY examples that reproduces the solutions to those questions below (including data sources).

Additionally, there is this other paper titled "Ten Useful Things you can do with the Matrix Profile and Ten Lines of Code" that might be worth reproducing

seanlaw commented 4 years ago

1. Have we ever seen a pattern that looks just like this?

The AIBO Robot Dog Data can be found here

import urllib
import ssl
import io
import os
import pandas as pd
import stumpy
import numpy as np

context = ssl.SSLContext()  # Ignore SSL certificate verification for simplicity

T_url = 'https://www.cs.unm.edu/~mueen/robot_dog.txt'
T_raw_bytes = urllib.request.urlopen(T_url, context=context).read()
T_data = io.BytesIO(T_raw_bytes)

Q_url = 'https://www.cs.unm.edu/~mueen/carpet_query.txt'
Q_raw_bytes = urllib.request.urlopen(Q_url, context=context).read()
Q_data = io.BytesIO(Q_raw_bytes)

T_df = pd.read_csv(T_data, header=None, sep='\s+', names=['walking'])
Q_df = pd.read_csv(Q_data, header=None, sep='\s+', names=['walking'])

distance_profile = stumpy.core.mass(Q_df['walking'], T_df['walking'])

k = 16
idx = np.argpartition(distance_profile, k)[:k]
topK_idx = idx[np.argsort(distance_profile[idx])]
MokaPot commented 3 years ago

@seanlaw we spoke about getting these up to the point of data loaded and ready to be worked on.

You mentioned Zenodo in another issue? That seems like a good way to do this?

Could preprocess the data into clean pandas friendly csvs, upload and the just call url from read_csv?

Guess it would be good to establish some structure upfront?

NimaSarajpoor commented 2 years ago

For now, please let's continue the discussion around this issue here: https://github.com/seanlaw/awesome-stumpy/issues/1