angus924 / minirocket

MINIROCKET: A Very Fast (Almost) Deterministic Transform for Time Series Classification
GNU General Public License v3.0
286 stars 32 forks source link

Example of CSV file reading #9

Closed jumpingfella closed 3 years ago

jumpingfella commented 3 years ago

Hello, I'm trying to figure out what minirocket expects as data on input. I keep on getting TypeError: No matching definition for argument type(s) pyobject, array(int32, 1d, C), array(int32, 1d, C), array(float32, 1d, C)

My data has following format:

timestamp,close
1619773130596,54559.47
1619773134938,54563.93
1619773139226,54554.23
1619773143564,54564.34

And I read it like this:

dataset = pd.read_csv(filename, usecols = [0, 1], header=0)
dataset = dataset.dropna()
dataset.columns = dataset.columns.to_series().apply(lambda x: x.strip())
angus924 commented 3 years ago

Hi @jumpingfella, the fit(...) and transform(...) functions from minirocket.py require the data to be in the form of a 2d numpy array, with dtype np.float32. Each row is a different time series, so if you have 10 timeseries, and each time series is of length 100, then your array should have 10 rows and 100 columns.

Does this answer your question?

jumpingfella commented 3 years ago

thanks, I had some progress with the following:

    df = df.to_numpy()
    df = df.astype(np.float32)
    print(df.shape)
    print(df)

which produces

(301, 2)
[[1.6197731e+12 5.4559469e+04]
 [1.6197731e+12 5.4563930e+04]
 [1.6197731e+12 5.4554230e+04]
 [1.6197731e+12 5.4564340e+04]

but then I get

parameters = fit(df)
  File "minirocket.py", line 130, in fit
    biases = _fit_biases(X, dilations, num_features_per_dilation, quantiles)
ValueError: unable to broadcast argument 1 to output array
File "minirocket.py", line 77
angus924 commented 3 years ago

Hi @jumpingfella, if I understand correctly, you are treating the input as being 301 time series, each of length 2. I think it is more likely that one of your columns is a time series (i.e., you have 1 or 2 time series, each of length 301).

Could you describe your data in a little more detail, and what you are trying to do? Then I might be able to help a bit better. At the moment, I'm not sure whether MiniRocket is the right fit for your data.

Thanks very much.

jumpingfella commented 3 years ago

I'm trying to do classification of bitcoin price. In my data first column is a timestamp, second - close price. Data arrives in 5 seconds intervals. I'm guessing that I need to keep only close column and do df = np.transpose(df)

jumpingfella commented 3 years ago

My problems are resolved with:

df = df.to_numpy()
df = df.astype(np.float32)
df = np.transpose(df)

Could be still nice to have full script which reads from CSV file.