Transfer Learning and SpecAugment applied to SSVEP Based BCI Classification

Kotzly commented 3 years ago

Review of Transfer Learning and SpecAugment applied to SSVEP Based BCI Classification.

Pedro R. A. S. Bassi*, Willian Rampazzo and Romis Attux

Kotzly commented 3 years ago

Abstract

In this work, we used a deep convolutional neural network (DCNN) to classify electroencephalography (EEG) signals in a steady-state visually evoked potentials (SSVEP) based brain-computer interface (BCI). The raw EEG signals were converted to spectrograms and served as input to train a DCNN using the transfer learning technique. We applied a second technique, data augmentation, mostly SpecAugment, generally employed to speech recognition. The results, when excluding the evaluated user’s data from the fine-tuning process, reached 99.3% mean test accuracy and 0.992 mean F1 score on 35 subjects from an open dataset.

Kotzly commented 3 years ago

Methods

It was used a VGG based network, with pretrained weights from AudioSet (VGGish) [1], and the network input is the STFT spectogram of EEG signals.

The training dataset was the one described in [2], with 35 subjects, from wich 8 had prior experiment with BCI systems. The dataset have EEG recordings from 40 lead full head electroencephalography, following the 10-20 international placement standard. During the recordings the subjects focus on flickering stimulus of 40 frequencies, ranging from 8-15.8Hz, in steps of 200mHz. In this work only data for 12 and 15Hz was used, for only the Oz electrode.

It was also used a data augmentation technique called SpecAugument [3]. SpecAugment is based in time warping, frequency masking and time masking. Time warping is randomly moving points horizontally across the center of the spectogram. Frequency masking is masking one or more rows in the spectogram. Time masking is masking one or more columns in the spectogram.

The dataset is composed of 6 trials of 5 seconds, for each frequency. The sampling rate is 250Hz. Then the STFT transformation is applied, and the STFT is converted to decibels, and normalized to be between 0 and 1. The windowing function was a rectangular window and was chosen empirically. The resulting images was 20x5 and with a resolution of 0.4Hz and 0.8s. The 20x5 spectograms were resampled to 96x64 to feed VGGish.

Results

VGGish with SpecAugment performed better.

References

[1] J. F. Gemmeke, D. P. W. Ellis, D. Freedman, A. Jansen, W. Lawrence, R. C. Moore, M. Plakal, M. Ritter, Audio set: An ontology and human labeled dataset for audio events, in: Proc. IEEE ICASSP 2017, New Orleans, LA, 2017.

[2] Y. Wang, X. Chen, X. Gao, S. Gao, A benchmark dataset for ssvep-based brain–computer interfaces, IEEE Transactions on Neural Systems and Rehabilitation Engineering 25 (10) (2017) 1746–1752.

[3] D. S. Park, W. Chan, Y. Zhang, C.-C. Chiu, B. Zoph, E. D. Cubuk, Q. V. Le, SpecAugment: A simple data augmentation method for automatic speech recognition, Interspeech 2019doi:10.21437/interspeech.

Kotzly / BCI_MsC