Sebelino / hypnoscorer

Automated sleep stage classifier using semi-supervised approach.
GNU General Public License v2.0
8 stars 3 forks source link
sleep-scorer

Hypnoscorer

Hypnoscorer (from "Hypnos" (sleep) and "scorer") is an automated semi-supervised sleep stage classifier under development. You can use it to load some EEG signal data including annotations, segment it, extract features from it, apply PCA, plot the feature space, do SVM classification and much more.

Installation

Clone the repository including the wfdb-toolbox submodule like so:

git clone --recursive git@github.com:Sebelino/hypnoscorer

Or just click Download ZIP on this page.

Dependencies

Download some data

This program is currently capable of reading eleven different, explicitly named records:

Load the slp01a record

Start by downloading the three files you will need for the slp01a record from the webpage linked to above:

$ cd data/slp01a
$ ls
slp01a.dat slp01a.hea slp01a.st
$ wfdb2mat -r slp01a
[...]
$ ls
slp01a.dat slp01a.hea slp01am.hea slp01am.mat slp01a.st

Now open up MATLAB and use the program to load the data like so:

>> labeledsignal = score('load slp01a')
Reading data/slp01a/slp01a...

labeledsignal = 

       eeg: [1x1 Signal]
    labels: [240x1 char]

Load the shhs1-200001 record

You need a couple of files:

>> labeledsignal = score('load shhs1-200001')
Reading data/shhs/shhs1-200001...
Step 1 of 2: Reading requested records. (This may take a few minutes.)...
Step 2 of 2: Parsing data...

labeledsignal = 

       eeg: [1x1 Signal]
    labels: [1084x1 char]

Interpreting the output

Now that you have successfully read either the slp01a record or an SHHS record, let us take a look at the output which was stored in the variable aptly named labeledsignal:

labeledsignal = 
       eeg: [1x1 Signal]
    labels: [240x1 char]

This little struct is an EEG signal labeled with R&K sleep stage annotations (Wake, REM, N1, N2, N3, N4), with 30 seconds between each label. As you can see, there are 240 labels for this signal. Here is how you display the first 50 labels:

>> labeledsignal.labels(1:50)'

ans =
44444444444433322233333333333444444444433332322222

You can easily tell that this signal is 120 minutes long since there are 240 label characters and 240 * 30 seconds = 120 minutes. As shown in the output above, the subject is deemed to start sleeping in N4 during the first 360 seconds, then switches to N3, and so on.

As for the signal itself, you can read the EEG voltage like so:

>> labeledsignal.eeg.Graph

ans =
         0   -0.0392
    0.0040   -0.0389
    0.0080   -0.0386
    0.0120   -0.0393
    0.0160   -0.0353
[...]

The left column is the time (in seconds) at which the voltage was sampled. The right column is the EEG voltage in millivolts.

Feature extraction

Now let us extract some features of the signal to create a feature space:

>> fs = score('load slp01a | segment 3 | extract')
Reading cache/slp01a.slp01a.mat...

fs = 
  720x1 LabeledFeaturevector array with properties:

    Label
    Vector

load slp01a | segment 3 | extract should be read as: "first load the slp01a record, then divide it into 30/3 = 10 second uniform segments, then extract seven features from each of the 720 segments". These features are: Mean, variance, skewness, kurtosis, Hjorth mobility, Hjorth complexity and amplitude. This results in a feature space consisting of 720 feature vectors. Find the values of the features of a vector like so:

>> fs(1).Vector

ans = 

                Mean: -0.0174
            Variance: 0.0020
            Skewness: 0.2085
            Kurtosis: 5.0548
      HjorthMobility: 18.9681
    HjorthComplexity: 3.7163e+03
           Amplitude: 0.3072

PCA

To reduce the dimensionality using principal component analysis, simply add pca to the end of the pipeline:

>> fs = score('load slp01a | segment 3 | extract | pca')
Reading cache/slp01a.slp01a.mat...
fs = 
  720x1 LabeledFeaturevector array with properties:
    Label
    Vector

>> fs(1).Vector
ans = 
    PC1: 0.3136
    PC2: -0.0587

Plotting

To make a 2D plot of the feature space including labels, simply add plot to the end of the pipeline:

>> score('load slp01a | segment 3 | extract | pca | plot')

Sample plot

Partitioning

Coming soon...

SVM classification

Coming soon...

Clear cache

This program uses a file cache to significantly speed up the process of loading data from record files (EDF, etc.). For comparison, loading the SHHS-200001 record takes about 70 seconds without a cache and less than a second with one.

Every record is cached in a MAT file in the cache/ directory, e.g. ./cache/shhs.shhs1-200001.mat. If you for some reason would like to clear the cache for a record, simply delete the corresponding MAT file for the record.