Open gsaurabhr opened 3 weeks ago
Dataset needs to have high density EEG. We found only one such dataset: https://github.com/csndl-iitd/realtime-sleep-staging/issues/5#issuecomment-2126712946
Basic data preprocessing:
The study is motivated by the observation that 2D CNNs applied to individual video frames perform well in action recognition. It demonstrated that 3D CNNs surpass 2D CNNs within the residual learning framework. Furthermore, it showed that decomposing 3D convolutional filters into separate spatial and temporal components using the R(2+1)D model, based on ResNet architecture, significantly improves accuracy.
They tested various models, such as 2D convolutions (R2D and f-R2D), 3D convolutions (R3D), Mixed convolutions (MCx and rMCx) and R(2+1)D convolutions.
The 2D convolutions treat each frame independently and fail to capture temporal information adequately.
The R3D convolution captures spatial and temporal information jointly but is computationally expensive and less effective than decomposed models.
The mixed models combine 2D and 3D convolutions within the same network, mixing them at various layers to balance spatial and temporal feature extraction. These may not fully exploit the temporal information available in the video sequences, leading to suboptimal performance, especially on tasks requiring long-term context understanding.
The R(2+1)D model decomposes a 3D convolution into a sequence of 2D and 1D convolutions. It uses 45 2D convolutions of 1×7×7 (handling spatial dimensions), followed by 64 1D convolutions of 3×1×1 (handling the temporal dimension) across consecutive frames. It consistently outperformed all the above models across various datasets.
Compared to full 3D convolution, (2+1)D decomposition offers two advantages:
Increased Nonlinearities: It doubles the number of nonlinearities in the network due to the additional ReLU between the 2D and 1D convolutions in each block, increasing the complexity of representable functions.
Easier Optimization: Forcing 3D convolution into separate spatial and temporal components simplifies optimization, resulting in lower training error compared to 3D convolutional networks of the same capacity. The gap in training losses is larger for deeper networks, indicating the facilitation in optimization increases with depth.
The below GitHub repositories use 3D-CNN for EEG data. They might be useful when we convert EEG data into image frames for our model.
Dataset Description |
Information | Comments |
---|---|---|
Number of subjects | 19 | Performed two different cognitive tasks on two different days before napping. Link: https://osf.io/zcu2w |
Number of recordings | 36 | (2-night recordings of 17 subjects and 1-night recording of 2 subjects) Link: https://github.com/nmningmei/Get_Sleep_data/blob/main/data/available_subjects.csv |
Number of channels | 64 | 62 EEG + 2 EOG Link: https://osf.io/ebvsr |
Original sampling frequency | 1000 Hz | |
Original highpass and lowpass filters | highpass: 0.0 Hz lowpass: 500.0 Hz |
Downsampling to 100 Hz.
Applying bandpass filter between 0.2 Hz and 40 Hz (this also removes all line noise around 60 Hz).
Performing average re-referencing on the EEG raw data. Alternatively, we can also perform mastoid re-referencing using TP9 and TP10 electrode positions (refer here).
The below visualizations are for single subject 29 (day 1)
This notebook contains all the visualizations and details: https://colab.research.google.com/drive/1QYWn7DLtCCCWRf5erdWIgdv6hdFR9-xH?usp=sharing
The raw EEG plots are too heavy to be uploaded here
For simplicity, this is a plot for 'Oz' channel for 30-sec duration (X -axis: Time (ms), Y-axis: Voltage (uV))
The below tSNE plots were created using 30 seconds epochs from all the subjects. These epochs are given as input features to the tSNE plots. The data is z-scaled before plotting. The three plots show varying level of perplexity (parameter relating to the number of nearest neighbors).
The below UMAP plots were created using 30 seconds epochs from all the subjects. These epochs are given as input features to the UMAP plots. The data is z-scaled before plotting. The two plots show varying levels of n_neighbours (parameter that balances local versus global structure in the data). The last one shows 3 dimensional plot.