Serpeve / EEGSym

Open implementation and code from the publication "EEGSym: Overcoming Intersubject Variability in Motor Imagery Based BCIs with Deep Learning"
MIT License
22 stars 0 forks source link

Experiments without pretraining #2

Closed gzoumpourlis closed 2 years ago

gzoumpourlis commented 2 years ago

Hi,

Thanks for providing the code for your interesting work! I would like to investigate the performance of EEGSym on each dataset separately, when training the model from scratch, i.e. without pretraining on the other 4 datasets. If you have done such an experiment (I am interested about the results on Physionet and OpenBMI) and would like to share the results, it would be great.

Otherwise, what hyperparameter values should I use for such an experiment? Having checked your code, I noticed that you suggest different values for pretraining/finetuning. Thus, I am mostly concerned about the following hyperparameter choices: model_hyparams['dropout_rate'] : 0.4 or 0.25? model_hyparams['learning_rate'] : 0.001 or 0.0001? model_hyparams['filters_per_branch'] : 24 or 8?

Thanks!

Serpeve commented 2 years ago

Hello,

Thank you for finding the work interesting. What we propose in our work is not only the architecture indepently but the joint use with the pretraining. The architecture is meant to be used with the pretrained weight values on the other datasets, as used in the paper. A comparison with our model is preferably done with the performance reported in the published article.

For the hyperparameters that you are asking about, in the case of implementing it without pretraining I am sure the implementation for the dropout and learning rate: model_hyparams['dropout_rate'] : 0.4 model_hyparams['learning_rate'] : 0.001 (a learning rate of 0.0001 may get superior performance but I think will have very slow convergence) But for the "filters_per_branch", due to the lower amount of examples maybe 8 filters per branch will have similar performance to the 24 filters. Nevertheless, I suggest using the same 24 filters as in the paper.

The Data Augmentation was thought to be used in the pretraining, so I would also not use it in the scenario that you propose.

In a tentative test with a 5 k-fold cross-validation (it was LOSO in the paper and in this condition will be better to use LOSO due to the reduced examples) we obtained 84.4+-10.7 on Physionet for 16 electrode (compared to the 88.6+-9.0 of the paper) and 82.8+-9.1 (compared to the 83.3+-9.3). Nevertheless, as I mentioned at the begining, in our work we do not propose EEGSym indepently of the pretraining on other databases as we state in the published article "... We believe that one of the clear advantages of our approach has been to use data from multiple publicly available datasets that share an imagination paradigm. They were used for pretraining the network to initialize the weights of the models evaluated. This improved use of transfer learning is made clear when comparing the inter-subject accuracies on Physionet [26] dataset. All baseline models and EEGSym outperform previous DL approaches that used all 64 electrodes [24], [29], [32] available with the information of only 16 electrodes. ... " So again, we would prefer a comparison with the results published in the article.