DL case study: RNN on seizure detection

nxthuan512 commented 6 years ago

Hi mọi người,

Hiện giờ mình đang dùng LSTM để phân tích tín hiệu não và phát hiện các co giật. Kết quả hiện giờ có 1 số vấn đề hơi lạ lạ, mọi người góp ý các bước cải tiến có thể. Thank you.

DATASET:

UPenn and Mayo Clinic's Seizure
Only Dog1 dataset is learnt at that momemt. This dataset contains 178 ictal and 478 interictal 1-s clips. Each clip has 16 channels and each channel stores 400 timesteps.
The labels of all timesteps of interictal clips are 0, while those of ictal clips depend on the latency parameters. For example, if latency is 32, all timesteps from 0 to 31 are labeled as 0, wherase the rest of the timesteps are labeled as 1.
80% of 178 ictal and 80% of 478 interictal clips are used for training. The rest is used for testing. The total samples of the training set and test set then are 476 and 120, respectively.
Value of each timestep ranges [-2200, 2200]. Standardizing values to [-1, 1] are calculated by scikit-learn http://scikit-learn.org/stable/modules/preprocessing.html. Both raw data and standardized data are used to train the model.

MODEL:
Batch size varies from 1 to 256.
LSTM with 1 and 2 hidden layers are used.
Each layer contains 8 to 40 hidden units.
9 important parameters including test loss, test accuracy, sensitivity, specificity, AUC, True positive (TP), False negative (FN), True negative (TN), and False positive (FP) are reported.

OBSERVATION:

The smaller batch size is assigned, the higher test accuracy is achieved. Batch size = 1 (online) produces the highest test accuracy, with the sacrifice of long training time. -> Hơi lạ.
Raw data produces higher test accuracy than standardized data (all the cases) -> Hơi lạ.
2-unit LSTM produces higher test accuracy than 1-unit LSTM (all the cases) -> Có vẻ đúng.
More hidden units used does not always lead to higher accuracy. At batch size = 1, the best case (sensitivity, specificity, and AUC > 90) occurs when the number of hidden units of layer 1 and 2 is both 32.
A high test accuracy does not mean a high sensitivity. However, a high sensitivity always leads to a high test accuracy.
False positive cases are quite high.

BEST RESULTS:

No hidden units of layer 1 = 32
No hidden units of layer 2 = 32
Sensitivity = 0.935334203
Specificity = 0.945243525
AUC = 0.940288864
TP = 12902
FN = 892
TN = 32333
FP = 1873 -> 1873 timesteps ~ 4.6s -> 4.6s/596s (0.77%)
The model size is around 300KB

IMPROVEMENTS:

Use K-fold cross-validation to verify overfitting.
How to increase Sensitivity <-> decrease FN?
How to decrease FP?

Thank you very much,

hitogen commented 6 years ago

@nxthuan512 nếu đc thì đại ca post link code để tham khảo luôn ạ.

nxthuan512 commented 6 years ago

Hi @hitogen , please find the code here . Thanks https://www.dropbox.com/s/ydratz6bmxm21bo/test01.zip?dl=0

Loss function of training and test sets [TensorBoard_test.pdf](https://github.com/dlapplications/dlapplications.github.io/files/2164634/TensorBoard_test.pdf) [TensorBoard_training.pdf](https://github.com/dlapplications/dlapplications.github.io/files/2164635/TensorBoard_training.pdf)

hitogen commented 6 years ago

@nxthuan512 trong đồ thị thì trục x là số epoch hay là j mà nó ko giống nhau cho các batchsize khác nhau vậy ạ?

nxthuan512 commented 6 years ago

Truc X la epoch. Do dung early stopping voi patient bang 10 nen no se tu dong stop neu loss func ko giam trong 10 lan lien tuc.

On Thu, Jul 5, 2018, 3:02 AM hitogen notifications@github.com wrote:

@nxthuan512 https://github.com/nxthuan512 trong đồ thị thì trục x là số epoch hay là j mà nó ko giống nhau cho các batchsize khác nhau vậy ạ?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/dlapplications/dlapplications.github.io/issues/17#issuecomment-402625085, or mute the thread https://github.com/notifications/unsubscribe-auth/AEPYXEOrlLh-Isdq1BFugtz-_96qyjTpks5uDboLgaJpZM4VCrcy .

hitogen commented 6 years ago

@nxthuan512 naruhodo. Nhìn đồ thị thì e nghĩ có vẻ như model đang overfit, do vậy càng tăng batch size thì càng overfit dataset (loss train nhỏ nhất nhưng loss test cao nhất). Chắc do data quá ít hoặc data quá giống nhau nên 32 units LSTM là đủ để overfit, đại ca tăng data với xem thử data có đa dạng k xem sao ạ.

nxthuan512 commented 6 years ago

@hitogen Standardizing data làm kết quả ổn định hơn. Không cần scale về khoảng, mà standardize phụ thuộc vào mean và standard variation. Test thử với 5-fold cross-validation thấy Sensitivity, Specificity, và AUC không bị phụ thuộc vào batch size. https://www.dropbox.com/s/ml1flq8npagrphk/20180706-5-fold-standardize.pdf?dl=0

nxthuan512 commented 6 years ago

@hitogen Hi Tuan, xem thu co cach nao improve? Code: https://www.dropbox.com/s/7q1f87njgbvg12g/final_002_model.py?dl=0 Report:
20180710_report.pdf

ducanh841988 commented 6 years ago

@nxthuan512
Chưa rõ lắm về input, ouput. Thuận mô tả rõ thêm tí. Mỗi sequence có bao nhiều step và output là cho từng step hay cho toàn chuỗi?

ducanh841988 commented 6 years ago

Đã hiểu được data như thế nào. Thuận cho cái cấu trúc LSTM được ko? LSTM xử lý theo sequence hay từng step? Theo như hình split data thì có vẻ đang làm theo step.

nxthuan512 commented 6 years ago

Hi @ducanh841988,

LSTM xử lí theo từng timestep. Hiện tại tui tích hợp CNN với LSTM theo hướng dẫn sau https://stackoverflow.com/questions/51344610/how-to-setup-1d-convolution-and-lstm-in-keras

Dữ liệu:

X = (n_samples, n_timesteps, n_features), where n_samples=476, n_timesteps=400, n_features=16 are the number of samples, timesteps, and features (or channels) of the signal.
y = (n_samples, n_timesteps, 1). Each timestep is labeled by either 0 or 1 (binary classification).

Code:

input_layer = Input(shape=(400, 16))
conv1 = Conv1D(filters=32,
               kernel_size=8,
               strides=1,
               activation='relu',
               padding='same')(input_layer)
lstm1 = LSTM(32, return_sequences=True)(conv1)
output_layer = Dense(1, activation='sigmoid')(lstm1)
model = Model(inputs=input_layer, outputs=output_layer)

model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['acc'])
history = model.fit(X_train, 
                              y_train, 
                              epochs=1000, 
                              batch_size=batch_size, 
                              verbose=1, 
                              shuffle=True, 
                              callbacks=callbacks,
                              class_weight=class_weights,
                              validation_data=(X_test, y_test))

Mô hình:

Layer (type)                 Output Shape              Param #   
=================================================================
input_4 (InputLayer)         (None, 400, 16)           0         
_________________________________________________________________
conv1d_4 (Conv1D)            (None, 400, 32)           4128      
_________________________________________________________________
lstm_4 (LSTM)                (None, 400, 32)           8320      
_________________________________________________________________
dense_4 (Dense)              (None, 400, 1)            33        
=================================================================
Total params: 12,481
Trainable params: 12,481
Non-trainable params: 0

cnn_lstm

Hiện giờ tui định lấy conv trên channels/features, thay vì timesteps. Ví dụ, 1 filter (2,1) chạy trên các channel của 1 timestep (xem hình dưới). Như vậy mình cần dùng Conv2D hay mình cần phải sắp xếp lại dữ liệu training? Thanks. v2ryb

dlapplications / dlapplications.github.io

DL case study: RNN on seizure detection #17