DataCanvasIO / HyperTS

A Full-Pipeline Automated Time Series (AutoTS) Analysis Toolkit.
https://hyperts.readthedocs.io
Apache License 2.0
260 stars 27 forks source link

Why 100% accuracy classification already in First epoch? #105

Closed ericleonardo closed 1 year ago

ericleonardo commented 1 year ago

Please, why my binary Classification experiment is getting 100% accuracy when just started training? Obviously, this result is not reflected in out of sample test. Test evaluate is giving 50% accuracy on binary. Data is multivariate time series classification in shape like (23000, 10, 2). 23k samples, 2 features, 10 length. Data format was converted using HyperTS function: from_3d_array_to_nested_df . Thanks!

experiment = make_experiment(df_train,
                            task='multivariate-binaryclass',
                            cv=True,
                            num_folds=5,
                            mode='dl',
                            tf_gpu_usage_strategy=1,
                            max_trials=4,
                            reward_metric='accuracy',
                            optimize_direction='max'
                            )

See the Acc and Val_acc showing 1.00 during the entire experiment and all models. But evaluating final model on Test data gives 0.50 binary classification acc.

Model: "HybirdRNN-lstm"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 input_continuous_all (Input  [(None, 10, 2)]          0         
 Layer)                                                          

 lstm_0 (LSTM)               (None, 10, 256)           265216    

 lstm_0_dropout (Dropout)    (None, 10, 256)           0         

 lstm_1 (LSTM)               (None, 10, 256)           525312    

 lstm_1_dropout (Dropout)    (None, 10, 256)           0         

 lstm_2 (LSTM)               (None, 256)               525312    

 dense_out (Dense)           (None, 1)                 257       

=================================================================
Total params: 1,316,097
Trainable params: 1,316,097
Non-trainable params: 0
_________________________________________________________________
Epoch 1/30
116/116 [================] - 8s 36ms/step - loss: 0.6936 - acc: 1.0000 - val_loss: 0.6943 - val_acc: 1.0000 - lr: 0.0010
Epoch 2/30
116/116 [================] - 5s 43ms/step - loss: 0.6933 - acc: 1.0000 - val_loss: 0.6931 - val_acc: 1.0000 - lr: 0.0010
Epoch 3/30
116/116 [================] - 4s 36ms/step - loss: 0.6932 - acc: 1.0000 - val_loss: 0.6931 - val_acc: 1.0000 - lr: 0.0010
zhangxjohn commented 1 year ago

Thank for your finding. It`s amazing!However, fitst maybe you analyze your train data and test data by EDA. For example, the distribution and trend of time series for each class (whether OOD exists). This is my suggestion currently.