eeyhsong / EEG-Conformer

EEG Transformer 2.0. i. Convolutional Transformer for EEG Decoding. ii. Novel visualization - Class Activation Topography.
GNU General Public License v3.0
429 stars 60 forks source link

Model Performance and Data Augmentation Improvements #40

Open snailpt opened 2 weeks ago

snailpt commented 2 weeks ago

Thank you so much for making your code open source and sharing it with the community! Your contribution has not only saved us a great deal of time and effort in our research, but it has also provided valuable support and reference for our experiments. The spirit of open-source greatly fosters academic exchange and innovation, and we sincerely appreciate your selfless efforts and contribution. We look forward to learning from your work in the future!

  1. Shallow ConvNet Feature Extraction with Fully Connected Layers Observation: After performing feature extraction using a shallow convolutional neural network (ConvNet), flattening the extracted features and directly passing them through a single fully connected (FC) layer yields better performance compared to using two fully connected layers. Details: In our experiments, we compared two configurations for the fully connected layers: Configuration A: Flattening the features and passing them through one FC layer. Configuration B: Flattening the features and passing them through two FC layers. Result: Configuration A consistently outperformed Configuration B in terms of accuracy and generalization, suggesting that the added complexity of the second FC layer might not be beneficial for this specific task and model architecture. Proposed Action: Based on these findings, we recommend sticking to the single fully connected layer configuration for optimal performance in this shallow ConvNet setup.

  2. Effect of LayerNorm and Multi-Head Attention (MHA) Order on Model Accuracy Observation: Swapping the order of LayerNorm and Multi-Head Attention (MHA) leads to improved model accuracy. In the original configuration, LayerNorm was applied after MHA, but in the modified version, applying LayerNorm before MHA resulted in better recognition accuracy. Details: The following configurations were tested: Configuration A: MHA followed by LayerNorm (original order). Configuration B: LayerNorm followed by MHA (modified order). Result: Configuration B (LayerNorm before MHA) consistently achieved better accuracy, likely due to improved feature normalization and stabilization of gradient flow during training. Proposed Action: We suggest reordering LayerNorm and MHA in the model architecture, applying LayerNorm before MHA, to achieve improved recognition accuracy.

  3. Data Augmentation Frequency and Model Performance Observation: Increasing the frequency of data augmentation results in improved model performance. When applying data augmentation more frequently (i.e., generating more augmented samples per batch), the model showed better generalization on the validation set. Details: In our experiments, we varied the number of augmented samples used during training: Configuration A: Standard data augmentation frequency (e.g., augmenting once per batch). Configuration B: Increasing data augmentation frequency (e.g., augmenting multiple times per batch). Result: Configuration B demonstrated a noticeable improvement in validation accuracy, indicating that additional augmented samples help the model generalize better by providing more diverse training data. Proposed Action: It is recommended to increase the data augmentation frequency during training for better generalization performance.

eeyhsong commented 2 weeks ago

Hi @snailpt, thanks a lot for your kindly sharing to improve the current version of convolutional transformers. Look forward to discussing new networks for EEG analysis with you!

snailpt commented 2 weeks ago

The fine-tuned code is available here: https://github.com/snailpt/CTNet/blob/main/Conformer_fine_tune_2a_77.66_2b_85.87.ipynb

eeyhsong commented 2 weeks ago

Thanks for you help!

snailpt commented 2 weeks ago

modify: Placing LayerNorm after MHA tends to yield better results.