FlorentF9 / DeepTemporalClustering

:chart_with_upwards_trend: Keras implementation of the Deep Temporal Clustering (DTC) model
MIT License
219 stars 58 forks source link

ValueError: Input 0 is incompatible with layer AE: expected shape=(None, 5210, 6), found shape=(None, 6) #19

Closed falcon-ram closed 2 years ago

falcon-ram commented 3 years ago

Hello, Thank you very much for the code. I am having some issues using your classes with my own program (also when running the main code in DeepTemporalClustering.py). My code so far:

dataSource = web.DataReader('S68.SI', 'yahoo', start=start_date, end=end_date)
X_train = dataSource.to_numpy()
# Some constant values
n_clusters = 2
pretrain_optimizer = 'adam'
optimizer = 'adam'
batch_size = 64
# Initialize model
dtc.initialize()
dtc.model.summary()
dtc.compile(gamma=1.0, optimizer=optimizer, initial_heatmap_loss_weight=0.1, final_heatmap_loss_weight=0.9)
# Pre train
dtc.pretrain(X=X_train, optimizer=pretrain_optimizer,
                     epochs=10, batch_size=batch_size,
                     save_dir='results/tmp')

At this point I'm getting the above mentioned error.

The X_train shape is 5210 by 6. i.e. 5210 timesteps and 6 features.

Upon investigation it seems that this line of code in TAE.py is causing the problem: x = Input(shape=(timesteps, input_dim), name='input_seq') Is it necessary for the Input shape to include the timesteps? I checked online and it seems that only the features should be part of the input shape. Is this correct?

Thank you and regards.

falcon-ram commented 3 years ago

Do I have to reshape my X_train to (1, 5210,6)?

FlorentF9 commented 3 years ago

This model aims at clustering whole multivariate time series, i.e. the input matrix should be (N, T, F) where N is the number of input series, T the length of each series (timesteps), and F the number of observed variables (features). The clustering is performed on the N inputs. I think your data set does not meet this requirement. What do you want to cluster exactly? The 5210 individual time points (i.e. 5210 points in a 6-dimensional space)? or the 6 univariate time series (i.e. 6 points in a 5210-dimensional space)?

falcon-ram commented 3 years ago

Hello Florent, Thanks for the response. I'm actually trying to classify 5210 points in a 6-dimensional space. I'm trying to see if I can get a Machine Learning algorithm can classify up trends, down trends and neutral trends in Stock market data. After looking more carefully at the code and the input data used I realized what is going on. Thanks and regards.