Is time series classification supported?

jagandecapri commented 3 years ago

Hi,

Given I have a time-series data with X_train shape = (samples, sequences, features) and a binary y_train shape = (samples,), how can I fit and predict using pyRCN ESNClassifier?

For example, if I generate the following variables

X_train = np.arange(27).reshape(3,3,3)
y_train = np.random.randint(0,2,3) # assume it generates [1,1,0]

X_test = X_train

When I call predict(X=X_test), it should be returning [1,1,0]. So far as I understand from the codebase, pyRCN only takes X as shape (n_samples, n_features) for the fit and predict functions.

On a side note, I am not sure whether the question I raised is similar to https://github.com/TUD-STKS/PyRCN/issues/12.

renierts commented 3 years ago

Hi @jagandecapri ,

thanks for the question. Time Series classification is possible via the pyrcn.echo_state_network.SeqToLabelClassifier.

This sequence classifier uses 1D numpy arrays of dtype object as input data. Each element of X is a time series of the shape (n_samples, n_features) and the target consists of 1D arrays containing the label for the time series.

I have adapted your example:

import numpy as np
from pyrcn.echo_state_network import SeqToLabelESNClassifier

time_series_input = np.arange(27).reshape(3,3,3)
time_series_label = np.random.randint(0,2,3) # assume it generates [1,1,0]
X_train = np.empty(shape=(3, ), dtype=object)
y_train = np.empty_like(X_train)

for k, (x, y) in enumerate(zip(time_series_input, time_series_label)):
    X_train[k] = x
    y_train[k] = np.atleast_1d(y)

X_test = X_train
esn = SeqToLabelESNClassifier().fit(X_train, y_train)
print(esn.predict(X_test)

Currently, we are developing the ESNRegressor and ESNClassifier in a way that they can also handle this data structure. In fact, your example is one more case that we can consider.

Is it correct that, in your example, iterating over the first dimension means to iterate over the time-series? E.g., X_train[0, :, :] is the first time-series

jagandecapri commented 3 years ago

Hi @renierts,

Thank you for the prompt response!

The data that I am working on is of the shape (samples, time_sequence,features). In other words, iterating over the first dimension means iterating over the samples.

However, I think the method of having the data as (time_sequence, samples, features) would also work. It would be a matter of the user to transpose the first and second dimension. Correct me if I'm wrong.

renierts commented 3 years ago

Hi @jagandecapri,

I see. In that case, I have understood it wrong in your original example. However, as you say it, the user just would need to transpose the dimensions.

I will allow 3D numpy arrays as (time_sequence, samples, features) and internally convert it as explained in my example.

The idea behind my current solution is that I want to deal with arbitrary sequence lengths. Using 3D numpy arrays (time_sequence, samples, features), this is only possible by zeropadding all sequences to the same length.

Can you work with the workaround I proposed in my first comment for now?

jagandecapri commented 3 years ago

Hi @renierts,

Working with a toy example as code below, I find that the prediction array is shorter than my expectation. Any idea why?

BATCH_LENGTH = 3
SEQUENCE_LENGTH = 2
FEATURES = 3

BATCH_IDX = 0
SEQUENCE_IDX = 1
FEATURE_IDX = 2

time_series_input = np.arange(18).reshape(BATCH_LENGTH,SEQUENCE_LENGTH,FEATURES)

# Transpose from (BATCH, SEQUENCE, FEATURE) to (SEQUENCE, BATCH, FEATURE)
time_series_input_T = np.transpose(time_series_input, (SEQUENCE_IDX,BATCH_IDX,FEATURE_IDX))

# Generate time series label equivalent to BATCH_LENGTH
time_series_label = np.random.randint(0,2,BATCH_LENGTH)

# Convert to dtype=object
X_train = np.empty(shape=(SEQUENCE_LENGTH, ), dtype=object)
for i, x in enumerate(time_series_input_T):
    X_train[i] = x

# Expand dim of labels
y_train = np.expand_dims(time_series_label, axis=1)
print(f'y_train: {y_train}')

X_test = X_train
esn = SeqToLabelESNClassifier().fit(X_train, y_train)

# Prediction array length is only 2. It should be 3, I suppose?
print(f'y_pred: {esn.predict(X_test)}')

renierts commented 3 years ago

Hi @jagandecapri, I think I misunderstood you at first. Sorry therefore. Some questions:

BATCH_LENGTH = 3: Do you have three sequences?
SEQUENCE_LENGTH = 2: Do you have a sequence length of 2?

If this is true, the solution is that X_train = np.empty(shape=(BATCH_LENGTH, ), dtype=object). The following code snippet should work:

import numpy as np
from pyrcn.echo_state_network import SeqToLabelESNClassifier

BATCH_LENGTH = 3
SEQUENCE_LENGTH = 2
FEATURES = 3

BATCH_IDX = 0
SEQUENCE_IDX = 1
FEATURE_IDX = 2

time_series_input = np.arange(18).reshape(BATCH_LENGTH,SEQUENCE_LENGTH,FEATURES)

# Generate time series label equivalent to BATCH_LENGTH
time_series_label = np.random.randint(0,2,BATCH_LENGTH)

# Convert to dtype=object
X_train = np.empty(shape=(BATCH_LENGTH, ), dtype=object)

for i, x in enumerate(time_series_input):
    # Each element of X_train has the shape (SEQUENCE_LENGTH, FEATURES)
    X_train[i] = x

# Expand dim of labels
y_train = np.expand_dims(time_series_label, axis=1)
print(f'y_train: {y_train}')

X_test = X_train
esn = SeqToLabelESNClassifier().fit(X_train, y_train)

# Prediction array length is now 3
print(f'y_pred: {esn.predict(X_test)}')

jagandecapri commented 3 years ago

Hi @renierts,

No worries. I think the answer is Yes for both your questions.

BATCH_LENGTH = 3: Do you have three sequences? => Yes
SEQUENCE_LENGTH = 2: Do you have a sequence length of 2? => Yes

Just to be sure we are on the same page, I'll give you an example as below from healthcare domain:

BATCH_LENGTH is equivalent to number of patients
SEQUENCE_LENGTH are records of each patient for each hour, i.e: for 48 hours
FEATURE_LENGTH is the number of FEATURES in each hour.

Is the above is what you understood too?

renierts commented 3 years ago

That's what I understood too. You have BATCH_LENGTH recordings, where each recording has a duration of SEQUENCE_LENGTH and where each recording has FEATURE_LENGTH features.

renierts commented 3 years ago

I close this issue, since we have discussed about these topics and the refactoring work on sequential processing is finished for now. Feel free to reopen the issue!

PlasmaControl / PyRCN

Is time series classification supported? #38