Sebelino / hypnoscorer

Automated sleep stage classifier using semi-supervised approach.
GNU General Public License v2.0
8 stars 3 forks source link

how to download the dataset #1

Open lyzhangjm opened 8 years ago

lyzhangjm commented 8 years ago

Hi, how to download the dataset ? the website http://www.physionet.org/pn3/shhpsgdb/ can't download

Sebelino commented 8 years ago

Hello, you can download the dataset I used (SHHS1) from the NSRR website. The data isn't exactly freely accessible; you will have to fill out a form to access it. Scroll down to the bottom of the page and click "Fill out a Data Access and Use Agreement".

lyzhangjm commented 8 years ago

Thanks for your help. it is difficult to access the SHHS1. Can other dataset(edf format) from http://www.physionet.org be used ?

Sebelino commented 8 years ago

It should work with other EDF files, but you may have to make some adjustments to lines 482-492 and 535-560 in score.m before it works. Which EDF dataset were you planning to use?

Alternatively, you could test the program with the slp01a record which you can download without any hassle. I wrote some instructions on how to use wfdb2mat to generate mat- and hea-files from it which the program is going to need to work. Please feel free to ask if you are wondering about the details; I haven't put much effort into making the software user-friendly yet.

Sebelino commented 8 years ago

BTW I just added a little tutorial to the readme; I hope it helps.

lyzhangjm commented 8 years ago

the edfread.m is not right, it is html format, rather than matlab code

lyzhangjm commented 8 years ago

I use matlab 2015b, but this position( [numerator,denominator] = str2fraction(tokens{2}); ) have error Undefined function or variable 'tokens'. Error in score>str2fraction (line 450) numerator = str2num(tokens{2}); I have change the function str2fraction(tokens{2}) into str2fraction(tokens{2},tokens), then the error disappearing. However, other error is appearing .

I run the code as follows: vectors = score('load shhs1-200001 | segment 3 | extract | select Mean Variance'); score(vectors,'bundle 12RW 34M | partition 0.25 | svm | eval | plot'); the error information is : 312 stream.svm = SVM(stream.trainingset,tokens{2}); Index exceeds matrix dimensions. Error in score (line 312) stream.svm = SVM(stream.trainingset,tokens{2});

Sebelino commented 8 years ago

Regarding edfread.m, you'd need to follow the link and then copy the code off the webpage.

str2fractions(tokens{2}) should be str2fraction(fracstr); it appears that Matlab 2015b added some static program analysis which catches errors that 2014b would not. I updated the file now anyway.

As for the latter error, try svm linear or svm rbf instead of just svm. I added support for specifying the kernel at some point, but forgot to update the documentation. Also you may want to do something like bundle 12RW 3 instead since it does not contain any "M" or "4" labels (slp01a does, however).

Sebelino commented 8 years ago

There turned out to be a few bugs in the eval and plot filters as well but they should work now. The following works as expected:

>> score('load 200001 | segment 1 | extract | select Mean Variance | bundle W 123R | partition 3:1 | svm linear | eval | plot')
Reading cache/shhs.shhs1-200001.mat...
ans = 
        trainingset: [813x1 LabeledFeaturevector]
         testingset: [271x1 LabeledFeaturevector]
                svm: [1x1 SVM]
       predictedset: [271x1 LabeledFeaturevector]
           accuracy: 0.8413
    confusionmatrix: [2x2 double]
     confusionorder: [2x1 char]
lyzhangjm commented 8 years ago

I use matlab 2014b, the error as follows:

score('load 200001 | segment 1 | extract | select Mean Variance | bundle W 123R | partition 3:1 | svm linear | eval | plot') Reading cache/shhs.shhs1-200001.mat... 10 predictormatrix = labeledfeaturevectors.matrix(); Error using horzcat Dimensions of matrices being concatenated are not consistent. Error in score (line 320) m = [tlabels,plabels];

is it m = [tlabels;plabels] ? the Dimensions of tlabels are different with that of plabels

Sebelino commented 8 years ago

Did you try the most recent version? I uploaded a fix for that bug 30 minutes ago.

lyzhangjm commented 8 years ago

the following are the debug information:

score('load 200001 | segment 1 | extract | select Mean Variance | bundle W 123R | partition 3:1 | svm rbf | eval | plot') Reading cache/shhs.shhs1-200001.mat... 320 m = [tlabels,plabels]; K>> size(tlabels) ans = 271 1 K>> size(plabels) ans = 1 1

the Dimensions of tlabels are different with that of plabels. hence, this position is wrong : m = [tlabels,plabels];

lyzhangjm commented 8 years ago

yes, I download the recent version, but the error appear either

Sebelino commented 8 years ago

That's odd... Just to make sure, is this the contents of the predict(...) function in your SVM.m?:

        function featureset = predict(self,predictors)
            % Predicts the labels for the given data
            alllabels = [];
            for f = fieldnames(self.Model)'
                [labels,score] = self.Model.(f{:}).predict(predictors.matrix());
                alllabels = [alllabels,labels];
            end
            %[M,F,C] = mode(uint8(alllabels)'); %TODO
            winnerlabels = mode(alllabels,2);
            featureset = arrayfun(@(i){LabeledFeaturevector(predictors(i).Vector,winnerlabels(i))},(1:size(winnerlabels,1)));
            featureset = [featureset{:}]';
        end

It returns a 271x1 LabeledFeaturevector for me. It used to return a single LabeledFeaturevector until I fixed it, hence the bug.

lyzhangjm commented 8 years ago

function featureset = predict(self,predictors) % Predicts the labels for the given data alllabels = []; for f = fieldnames(self.Model)' [labels,score] = predict(self.Model.(f{:}),predictors.matrix()); alllabels = [alllabels,labels]; end %[M,F,C] = mode(uint8(alllabels)'); %TODO winnerlabels = mode(alllabels')'; featureset = arrayfun(@(i){LabeledFeaturevector(predictors(i).Vector,winnerlabels(i))},(1:size(winnerlabels,1))); featureset = [featureset{:}]'; end

lyzhangjm commented 8 years ago

the size of alllabels is right(271), but the size of winnerlabels is 1

Sebelino commented 8 years ago

It appears that you are still using the old version of SVM.m since line 29 in your file is different. I updated that file less than an hour ago. Please download the newer version and try again. The most recent version of SVM.m should look like this.

Out of curiosity, are you using git to download the files or are you downloading the ZIP of the repository? It is easier to download recent changes to your directory if you use git by doing a git pull, assuming that you have git installed.

$ git clone https://github.com/Sebelino/hypnoscorer
$ cd hypnoscorer
[...]
$ git pull
lyzhangjm commented 8 years ago

after changing winnerlabels = mode(alllabels')' into winnerlabels = mode(alllabels,2), the program is right.

score('load shhs1-200002 | segment 3 | extract | select Mean Variance | bundle 12RW 34M | partition 0.25 | svm rbf | eval | plot') Reading cache/shhs.shhs1-200002.mat... ans = trainingset: [809x1 LabeledFeaturevector] testingset: [2428x1 LabeledFeaturevector] svm: [1x1 SVM] predictedset: [2428x1 LabeledFeaturevector] accuracy: 0.9428 confusionmatrix: [2x2 double] confusionorder: [2x1 char]

However, I don't know why the accuracy is 0.9428. Due to shhs1-200002 is split into training set and test set, the training set is few。But the DBN need lots of tranining set.

Sebelino commented 8 years ago

That's great. I am closing this issue since it got a little derailed, but please feel free to open up a new issue if you come across more bugs; I am sure there are several.

Sebelino commented 8 years ago

I decided to reopen this issue after seeing your edit. I think the reason the accuracy is so high is that there are very few vectors labeled 3, 4 or M. A better comparison would be to separate wakefulness from sleep by changing the parameters of the bundle filter a bit, like this:

>> score('load 200002 | segment 3 | extract | select Mean Variance | bundle W 123R | partition 3:1 | svm rbf | eval | plot')
Reading cache/shhs.shhs1-200002.mat...
ans = 
        trainingset: [2428x1 LabeledFeaturevector]
         testingset: [809x1 LabeledFeaturevector]
                svm: [1x1 SVM]
       predictedset: [809x1 LabeledFeaturevector]
           accuracy: 0.7775
    confusionmatrix: [2x2 double]
     confusionorder: [2x1 char]

Note that you can use the argument to the partition filter to control the size of the training set. I changed the argument above to 3:1, which causes the training set to be 3 times bigger than the test set. A parameter of 0.25 means the opposite: that the training set becomes 25 % of the size of the test set.

Also, you can skip the bundle step altogether if you'd like to distinguish all five sleep stages (W, R, N1, N2, N3) from each other:

>> score('load 200002 | segment 3 | extract | select Mean Variance | partition 3:1 | svm rbf | eval')
Reading cache/shhs.shhs1-200002.mat...
ans = 
        trainingset: [2428x1 LabeledFeaturevector]
         testingset: [809x1 LabeledFeaturevector]
                svm: [1x1 SVM]
       predictedset: [809x1 LabeledFeaturevector]
           accuracy: 0.7132
    confusionmatrix: [5x5 double]
     confusionorder: [5x1 char]

Plotting does not seem to work as intended when doing this, however. I'll see if I can fix that.