gsaurabhr commented 3 weeks ago

[x] Dataset
[ ] Basic visualization (follow #7 )
[ ] Literature
- [x] #8
- [x] @Tanvig Add the other paper for which you found code here
[ ] Implementing the (2+1)D model (used in both papers above)
[ ] Implementing the 3D model (used in both papers above)

gsaurabhr commented 3 weeks ago

Dataset needs to have high density EEG. We found only one such dataset: https://github.com/csndl-iitd/realtime-sleep-staging/issues/5#issuecomment-2126712946

gsaurabhr commented 3 weeks ago

Basic data preprocessing:

Down sample to 100 Hz
Apply band pass filter between 0.2 Hz and 40 Hz (which also removes all line noise)
Re-reference to mastoid electrodes (what are the electrode names?)
Epoch the data
- Use the annotations to detect 30 second epochs that are manually labeled
- Generate individual .npy files for each epoch with all 64 electrode pre-processed data and save with appropriate filename

Tanvig commented 3 weeks ago

Paper

Tran, D., Wang, H., Torresani, L., Ray, J., LeCun, Y., & Paluri, M. (2018). A closer look at spatiotemporal convolutions for action recognition. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition (pp. 6450-6459).

Useful GitHub repositories:

Summary:

The study is motivated by the observation that 2D CNNs applied to individual video frames perform well in action recognition. It demonstrated that 3D CNNs surpass 2D CNNs within the residual learning framework. Furthermore, it showed that decomposing 3D convolutional filters into separate spatial and temporal components using the R(2+1)D model, based on ResNet architecture, significantly improves accuracy.
They tested various models, such as 2D convolutions (R2D and f-R2D), 3D convolutions (R3D), Mixed convolutions (MCx and rMCx) and R(2+1)D convolutions.
The 2D convolutions treat each frame independently and fail to capture temporal information adequately.
The R3D convolution captures spatial and temporal information jointly but is computationally expensive and less effective than decomposed models.
The mixed models combine 2D and 3D convolutions within the same network, mixing them at various layers to balance spatial and temporal feature extraction. These may not fully exploit the temporal information available in the video sequences, leading to suboptimal performance, especially on tasks requiring long-term context understanding.
The R(2+1)D model decomposes a 3D convolution into a sequence of 2D and 1D convolutions. It uses 45 2D convolutions of 1×7×7 (handling spatial dimensions), followed by 64 1D convolutions of 3×1×1 (handling the temporal dimension) across consecutive frames. It consistently outperformed all the above models across various datasets.
Compared to full 3D convolution, (2+1)D decomposition offers two advantages:
- Increased Nonlinearities: It doubles the number of nonlinearities in the network due to the additional ReLU between the 2D and 1D convolutions in each block, increasing the complexity of representable functions.
- Easier Optimization: Forcing 3D convolution into separate spatial and temporal components simplifies optimization, resulting in lower training error compared to 3D convolutional networks of the same capacity. The gap in training losses is larger for deeper networks, indicating the facilitation in optimization increases with depth.

Tanvig commented 3 weeks ago

The below GitHub repositories use 3D-CNN for EEG data. They might be useful when we convert EEG data into image frames for our model.

Tanvig commented 2 weeks ago

7 Basic Data Visualizations

Data information:

Dataset Description	Information	Comments
Number of subjects	19	Performed two different cognitive tasks on two different days before napping. Link: https://osf.io/zcu2w
Number of recordings	36	(2-night recordings of 17 subjects and 1-night recording of 2 subjects) Link: https://github.com/nmningmei/Get_Sleep_data/blob/main/data/available_subjects.csv
Number of channels	64	62 EEG + 2 EOG Link: https://osf.io/ebvsr
Original sampling frequency	1000 Hz
Original highpass and lowpass filters	highpass: 0.0 Hz lowpass: 500.0 Hz

Data preprocessing steps:

Downsampling to 100 Hz.
Applying bandpass filter between 0.2 Hz and 40 Hz (this also removes all line noise around 60 Hz).
Performing average re-referencing on the EEG raw data. Alternatively, we can also perform mastoid re-referencing using TP9 and TP10 electrode positions (refer here).

Visualizations

The below visualizations are for single subject 29 (day 1)

This notebook contains all the visualizations and details: https://colab.research.google.com/drive/1QYWn7DLtCCCWRf5erdWIgdv6hdFR9-xH?usp=sharing

Before pre-processing

The raw EEG plots are too heavy to be uploaded here
For simplicity, this is a plot for 'Oz' channel for 30-sec duration (X -axis: Time (ms), Y-axis: Voltage (uV))

Power Spectral Density (PSD) plot for all the channels

Hypnogram

After pre-processing

For comparison , this is a plot for 'Oz' channel for 30-sec duration (X -axis: Time (ms), Y-axis: Voltage (uV))

Power Spectral Density (PSD) plot for all the channels. We can see that the line noise at 60 Hz has disappeared due to bandpass filtering. We can also see a notch around 12-13 Hz which may signify the alpha waves with high power.

Below is a spectrogram for a single channel (Oz). The x-axis ranges from 0 - 1804.95 secs. (i.e. the total duration).

The below plot shows the spectrogram for single channel Oz for a duration of 30 secs from starting. We can observe the high power spectrum around 11-12 Hz (might be alpha waves)

Below is a tSNE plot for single channel (Cz). I have segmented the channel's EEG data into ~30 sec epochs. These epochs are now features of the tSNE plot. I was also trying to understand whether Standard Scaling would make a difference or not, hence, plotted both cases (with and without scaling).

The below tSNE plot is for all 64 channels. Similar to above, the data is segmented into ~30 secs epochs, which will be used as inputs (features) to tSNE function. For this also, I have tried to plot both the scaled and unscaled version.

Below is a UMAP for all EEG channels. Similar to tSNE, I have used segmented data for epochs (~30 secs) that will be used as input to UMAP function. Both the scaled and unscaled versions are plotted.

Tanvig commented 6 days ago

tSNE plot for all subjects

The below tSNE plots were created using 30 seconds epochs from all the subjects. These epochs are given as input features to the tSNE plots. The data is z-scaled before plotting. The three plots show varying level of perplexity (parameter relating to the number of nearest neighbors).

tsne_all_subjects_perplexity30

tsne_all_subjects_perplexity50

tsne_all_subjects_perplexity75

UMAP plots for all subjects

The below UMAP plots were created using 30 seconds epochs from all the subjects. These epochs are given as input features to the UMAP plots. The data is z-scaled before plotting. The two plots show varying levels of n_neighbours (parameter that balances local versus global structure in the data). The last one shows 3 dimensional plot.

UMAP_all_subjects_neighbours15

UMAP_all_subjects_neighbours8

3d_umap

csndl-iitd / realtime-sleep-staging

3D CNNs based approach #10

Paper

Useful GitHub repositories:

Summary:

7 Basic Data Visualizations

Data information:

Data preprocessing steps:

Visualizations

Before pre-processing

Hypnogram

After pre-processing

tSNE plot for all subjects

UMAP plots for all subjects