Closed jagandecapri closed 3 years ago
Hi @jagandecapri ,
thanks for the question. Time Series classification is possible via the pyrcn.echo_state_network.SeqToLabelClassifier
.
This sequence classifier uses 1D numpy arrays of dtype object as input data. Each element of X is a time series of the shape (n_samples, n_features)
and the target consists of 1D arrays containing the label for the time series.
I have adapted your example:
import numpy as np
from pyrcn.echo_state_network import SeqToLabelESNClassifier
time_series_input = np.arange(27).reshape(3,3,3)
time_series_label = np.random.randint(0,2,3) # assume it generates [1,1,0]
X_train = np.empty(shape=(3, ), dtype=object)
y_train = np.empty_like(X_train)
for k, (x, y) in enumerate(zip(time_series_input, time_series_label)):
X_train[k] = x
y_train[k] = np.atleast_1d(y)
X_test = X_train
esn = SeqToLabelESNClassifier().fit(X_train, y_train)
print(esn.predict(X_test)
Currently, we are developing the ESNRegressor and ESNClassifier in a way that they can also handle this data structure. In fact, your example is one more case that we can consider.
Is it correct that, in your example, iterating over the first dimension means to iterate over the time-series? E.g., X_train[0, :, :]
is the first time-series
Hi @renierts,
Thank you for the prompt response!
The data that I am working on is of the shape (samples, time_sequence,features)
. In other words, iterating over the first dimension means iterating over the samples.
However, I think the method of having the data as (time_sequence, samples, features)
would also work. It would be a matter of the user to transpose the first and second dimension. Correct me if I'm wrong.
Hi @jagandecapri,
I see. In that case, I have understood it wrong in your original example. However, as you say it, the user just would need to transpose the dimensions.
I will allow 3D numpy arrays as (time_sequence, samples, features)
and internally convert it as explained in my example.
The idea behind my current solution is that I want to deal with arbitrary sequence lengths. Using 3D numpy arrays (time_sequence, samples, features)
, this is only possible by zeropadding all sequences to the same length.
Can you work with the workaround I proposed in my first comment for now?
Hi @renierts,
Working with a toy example as code below, I find that the prediction array is shorter than my expectation. Any idea why?
BATCH_LENGTH = 3
SEQUENCE_LENGTH = 2
FEATURES = 3
BATCH_IDX = 0
SEQUENCE_IDX = 1
FEATURE_IDX = 2
time_series_input = np.arange(18).reshape(BATCH_LENGTH,SEQUENCE_LENGTH,FEATURES)
# Transpose from (BATCH, SEQUENCE, FEATURE) to (SEQUENCE, BATCH, FEATURE)
time_series_input_T = np.transpose(time_series_input, (SEQUENCE_IDX,BATCH_IDX,FEATURE_IDX))
# Generate time series label equivalent to BATCH_LENGTH
time_series_label = np.random.randint(0,2,BATCH_LENGTH)
# Convert to dtype=object
X_train = np.empty(shape=(SEQUENCE_LENGTH, ), dtype=object)
for i, x in enumerate(time_series_input_T):
X_train[i] = x
# Expand dim of labels
y_train = np.expand_dims(time_series_label, axis=1)
print(f'y_train: {y_train}')
X_test = X_train
esn = SeqToLabelESNClassifier().fit(X_train, y_train)
# Prediction array length is only 2. It should be 3, I suppose?
print(f'y_pred: {esn.predict(X_test)}')
Hi @jagandecapri, I think I misunderstood you at first. Sorry therefore. Some questions:
BATCH_LENGTH = 3
: Do you have three sequences?SEQUENCE_LENGTH = 2
: Do you have a sequence length of 2?If this is true, the solution is that X_train = np.empty(shape=(BATCH_LENGTH, ), dtype=object)
. The following code snippet should work:
import numpy as np
from pyrcn.echo_state_network import SeqToLabelESNClassifier
BATCH_LENGTH = 3
SEQUENCE_LENGTH = 2
FEATURES = 3
BATCH_IDX = 0
SEQUENCE_IDX = 1
FEATURE_IDX = 2
time_series_input = np.arange(18).reshape(BATCH_LENGTH,SEQUENCE_LENGTH,FEATURES)
# Generate time series label equivalent to BATCH_LENGTH
time_series_label = np.random.randint(0,2,BATCH_LENGTH)
# Convert to dtype=object
X_train = np.empty(shape=(BATCH_LENGTH, ), dtype=object)
for i, x in enumerate(time_series_input):
# Each element of X_train has the shape (SEQUENCE_LENGTH, FEATURES)
X_train[i] = x
# Expand dim of labels
y_train = np.expand_dims(time_series_label, axis=1)
print(f'y_train: {y_train}')
X_test = X_train
esn = SeqToLabelESNClassifier().fit(X_train, y_train)
# Prediction array length is now 3
print(f'y_pred: {esn.predict(X_test)}')
Hi @renierts,
No worries. I think the answer is Yes for both your questions.
BATCH_LENGTH = 3
: Do you have three sequences? => YesSEQUENCE_LENGTH = 2
: Do you have a sequence length of 2? => YesJust to be sure we are on the same page, I'll give you an example as below from healthcare domain:
BATCH_LENGTH
is equivalent to number of patientsSEQUENCE_LENGTH
are records of each patient for each hour, i.e: for 48 hoursFEATURE_LENGTH
is the number of FEATURES in each hour.Is the above is what you understood too?
That's what I understood too. You have BATCH_LENGTH
recordings, where each recording has a duration of SEQUENCE_LENGTH
and where each recording has FEATURE_LENGTH
features.
I close this issue, since we have discussed about these topics and the refactoring work on sequential processing is finished for now. Feel free to reopen the issue!
Hi,
Given I have a time-series data with
X_train shape = (samples, sequences, features)
and a binaryy_train shape = (samples,)
, how can I fit and predict using pyRCN ESNClassifier?For example, if I generate the following variables
When I call
predict(X=X_test)
, it should be returning[1,1,0]
. So far as I understand from the codebase, pyRCN only takes X as shape(n_samples, n_features)
for thefit
andpredict
functions.On a side note, I am not sure whether the question I raised is similar to https://github.com/TUD-STKS/PyRCN/issues/12.