PlasmaControl / PyRCN

A Python 3 framework for Reservoir Computing with a scikit-learn-compatible API.
BSD 3-Clause "New" or "Revised" License
89 stars 19 forks source link

Is time series classification supported? #38

Closed jagandecapri closed 3 years ago

jagandecapri commented 3 years ago

Hi,

Given I have a time-series data with X_train shape = (samples, sequences, features) and a binary y_train shape = (samples,), how can I fit and predict using pyRCN ESNClassifier?

For example, if I generate the following variables

X_train = np.arange(27).reshape(3,3,3)
y_train = np.random.randint(0,2,3) # assume it generates [1,1,0]

X_test = X_train

When I call predict(X=X_test), it should be returning [1,1,0]. So far as I understand from the codebase, pyRCN only takes X as shape (n_samples, n_features) for the fit and predict functions.

On a side note, I am not sure whether the question I raised is similar to https://github.com/TUD-STKS/PyRCN/issues/12.

renierts commented 3 years ago

Hi @jagandecapri ,

thanks for the question. Time Series classification is possible via the pyrcn.echo_state_network.SeqToLabelClassifier.

This sequence classifier uses 1D numpy arrays of dtype object as input data. Each element of X is a time series of the shape (n_samples, n_features) and the target consists of 1D arrays containing the label for the time series.

I have adapted your example:

import numpy as np
from pyrcn.echo_state_network import SeqToLabelESNClassifier

time_series_input = np.arange(27).reshape(3,3,3)
time_series_label = np.random.randint(0,2,3) # assume it generates [1,1,0]
X_train = np.empty(shape=(3, ), dtype=object)
y_train = np.empty_like(X_train)

for k, (x, y) in enumerate(zip(time_series_input, time_series_label)):
    X_train[k] = x
    y_train[k] = np.atleast_1d(y)

X_test = X_train
esn = SeqToLabelESNClassifier().fit(X_train, y_train)
print(esn.predict(X_test)

Currently, we are developing the ESNRegressor and ESNClassifier in a way that they can also handle this data structure. In fact, your example is one more case that we can consider.

Is it correct that, in your example, iterating over the first dimension means to iterate over the time-series? E.g., X_train[0, :, :] is the first time-series

jagandecapri commented 3 years ago

Hi @renierts,

Thank you for the prompt response!

The data that I am working on is of the shape (samples, time_sequence,features). In other words, iterating over the first dimension means iterating over the samples.

However, I think the method of having the data as (time_sequence, samples, features) would also work. It would be a matter of the user to transpose the first and second dimension. Correct me if I'm wrong.

renierts commented 3 years ago

Hi @jagandecapri,

I see. In that case, I have understood it wrong in your original example. However, as you say it, the user just would need to transpose the dimensions.

I will allow 3D numpy arrays as (time_sequence, samples, features) and internally convert it as explained in my example.

The idea behind my current solution is that I want to deal with arbitrary sequence lengths. Using 3D numpy arrays (time_sequence, samples, features), this is only possible by zeropadding all sequences to the same length.

Can you work with the workaround I proposed in my first comment for now?

jagandecapri commented 3 years ago

Hi @renierts,

Working with a toy example as code below, I find that the prediction array is shorter than my expectation. Any idea why?

BATCH_LENGTH = 3
SEQUENCE_LENGTH = 2
FEATURES = 3

BATCH_IDX = 0
SEQUENCE_IDX = 1
FEATURE_IDX = 2

time_series_input = np.arange(18).reshape(BATCH_LENGTH,SEQUENCE_LENGTH,FEATURES)

# Transpose from (BATCH, SEQUENCE, FEATURE) to (SEQUENCE, BATCH, FEATURE)
time_series_input_T = np.transpose(time_series_input, (SEQUENCE_IDX,BATCH_IDX,FEATURE_IDX))

# Generate time series label equivalent to BATCH_LENGTH
time_series_label = np.random.randint(0,2,BATCH_LENGTH)

# Convert to dtype=object
X_train = np.empty(shape=(SEQUENCE_LENGTH, ), dtype=object)
for i, x in enumerate(time_series_input_T):
    X_train[i] = x

# Expand dim of labels
y_train = np.expand_dims(time_series_label, axis=1)
print(f'y_train: {y_train}')

X_test = X_train
esn = SeqToLabelESNClassifier().fit(X_train, y_train)

# Prediction array length is only 2. It should be 3, I suppose?
print(f'y_pred: {esn.predict(X_test)}')
renierts commented 3 years ago

Hi @jagandecapri, I think I misunderstood you at first. Sorry therefore. Some questions:

If this is true, the solution is that X_train = np.empty(shape=(BATCH_LENGTH, ), dtype=object). The following code snippet should work:

import numpy as np
from pyrcn.echo_state_network import SeqToLabelESNClassifier

BATCH_LENGTH = 3
SEQUENCE_LENGTH = 2
FEATURES = 3

BATCH_IDX = 0
SEQUENCE_IDX = 1
FEATURE_IDX = 2

time_series_input = np.arange(18).reshape(BATCH_LENGTH,SEQUENCE_LENGTH,FEATURES)

# Generate time series label equivalent to BATCH_LENGTH
time_series_label = np.random.randint(0,2,BATCH_LENGTH)

# Convert to dtype=object
X_train = np.empty(shape=(BATCH_LENGTH, ), dtype=object)

for i, x in enumerate(time_series_input):
    # Each element of X_train has the shape (SEQUENCE_LENGTH, FEATURES)
    X_train[i] = x

# Expand dim of labels
y_train = np.expand_dims(time_series_label, axis=1)
print(f'y_train: {y_train}')

X_test = X_train
esn = SeqToLabelESNClassifier().fit(X_train, y_train)

# Prediction array length is now 3
print(f'y_pred: {esn.predict(X_test)}')
jagandecapri commented 3 years ago

Hi @renierts,

No worries. I think the answer is Yes for both your questions.

Just to be sure we are on the same page, I'll give you an example as below from healthcare domain:

Is the above is what you understood too?

renierts commented 3 years ago

That's what I understood too. You have BATCH_LENGTH recordings, where each recording has a duration of SEQUENCE_LENGTH and where each recording has FEATURE_LENGTH features.

renierts commented 3 years ago

I close this issue, since we have discussed about these topics and the refactoring work on sequential processing is finished for now. Feel free to reopen the issue!